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Abstract 


High-speed  digital  signal  processing  has  a  wide  range  of  applications  including, 
radar,  sonar,  image  processing,  and  target  acquisition.  The  calculation  of  the  Discrete 
Fourier  Transform  (DFT)  used  in  these  applications  has  long  been  a  significant 
bottleneck  for  high-speed  processing.  Previous  AFIT  students  have  adopted  a  Prime  Fac¬ 
tor  Algorithm  (PFA)  method  using  Winograd  Fourier  Transform  (WFT)  processors. 
Three  WFT  processors  are  pipelined  into  a  system  capable  of  computing  a  4080-point 
DFT  on  complex  data  approximately  every  120  microseconds  when  operating  with  a  70 
MHz  clock. 

This  thesis  effort  addressed  the  design  and  implementation  of  PFA  controller  chip 
and  interconnecting  memory  modules  between  the  WFT  processors.  The  PFA  controller 
is  an  application  specific  processor  to  control  the  flow  of  information  in  the  pipeline, 
interface  to  the  VVTT  processors,  monitor  pipeline  status,  and  take  corrective  action  in 
the  presence  of  faults.  The  interconnecting  memory  modules  buffer  the  data  coming  out 
of  a  WFT  processor  and  going  into  another  allowing  concurrent  reading  and  writing. 

The  PFA  controller  chip  was  designed,  simulated,  and  submitted  for  fabrication 
through  MOSIS.  Twenty-eight  16-bit  registers  store  the  pipeline  information.  An 
arithmetic/logic  unit  (ALU)  computes  data  transformations.  A  read  only  memory  stores 
the  microcode.  A  control  sequencer  sequences  through  the  proper  code  segments.  Finally, 
special  circuitry  interprets  the  fault  information  and  reconfigures  the  pipeline. 
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This  thesis  effort  included  writing  a  microcode  assembler  to  to  raise  the  user  inter¬ 
face  to  the  AF1T-XROM  silicon  compiler.  Raising  the  user’s  level  of  abstraction  to 
mnemonic  microcode,  while  still  providing  an  error  free  path  to  silicon  layout,  reduces 
chances  for  error  in  the  microcode  specification  A  generic  microcode  assembler  tool  was 
created  as  an  extension  for  use  with  other  application  specific  processors  This  tools  gen¬ 
erates  a  microcode  assembler  from  a  word  for-., at  and  a  translation  file  The  assembler 
will  output  a  file  compatible  with  the  XROM  compiler,  a  VHDL  description  of  the 
XROM,  a  listing  file,  a  reference  file,  and  a  reverse  assembly  This  tool  was  tested  on  two 
other  AFIT  theses  and  a  computer  architecture  class 

A  prototype  memory  chip  was  designed  and  fabricated  in  3  micron  CMOS  through 
MOS1S  to  test  the  1-transistor  memory  cell,  the  wordline  selectors,  and  the  sense 
amplifiers.  Simulations  predict  an  access  time  of  10ns  A  larger  memory  was  designed, 
simulated,  and  submitted  for  fabrication  through  MOS1S.  It  contains  storage  for  272 
words  of  32  bits  each  It  is  dual  ported  and  permits  concurrent  reading  and  writing  of  24 
bit  data  The  memory  also  includes  error  control  circuitry  for  single  error  correction  and 


double  error  detection 


DESIGN  AND  IMPLEMENTATION  OF  A 


VLSI  PRIME  FACTOR  ALGORITHM  PROCESSOR 

CHAPTER  1 

Introduction 

1.1.  Background 

The  military  has  a  demonstrated  interest  in  high-speed  digital  signal  processing 
(DSP).  Digital  signal  processing  is  used  in  a  wide  range  of  applications  including  radar, 
sonar,  image  processing,  voice  processing,  artificial  intelligence,  and  target  acquisition. 
Applications  such  as  these  require  data  from  sample  points  to  be  processed  as  near  to 
real-time  as  possible  with  a  reasonable  amount  of  resolution.  A  reasonable  amount  of 
resolution  is  determined  by  the  importance  of  the  application  and  how  accurate  the 
results  must  be.  Since  resolution  increases  with  the  number  of  points  sampled,  it  is  logi¬ 
cal  to  increase  the  number  of  sample  points.  Unfortunately,  as  the  number  of  sample 
points  increases,  the  number  of  operations  increases  as  the  square  of  the  number  of  sam¬ 
ple  points  (i.e.,  0(N2)). 

Many  current  DSP  applications  involve  computation  of  the  Discrete  Fourier 
Transform  (DFT)  which  uses  only  a  finite  set  of  sampled  signal  values  instead  of  the  ori¬ 
ginal  analog  signal.  Using  a  finite  set  of  points  allows  the  signal  to  be  processed  and 
stored  on  a  digital  computer.  The  DFT,  in  summation  form,  is  as  follows: 


X(k)=  £*(«)»'*"  k  =  0, 1,...,N— 1  (1.1) 

n  —  0 

where  W  sub  N  ~=~  e  sup  {-j(2  pi  /N)}.  Since  the  number  of  computations  grows  as 
0(N2)  when  computed  as  a  sequence  of  inner  products,  the  DFT  is  not  usually  computed 
directly.  Instead,  a  class  of  algorithms  developed  in  1965  by  Cooley  and  Tukey  [Coo65] 
is  often  used  which  capitalize  on  the  symmetry  in  the  DFT  to  reduce  to  complexity  of 
computation  This  class  of  algorithms,  known  as  Fast  Fourier  Transforms  (FFTs), 
reduces  the  number  of  operations  from  being  proportional  to  the  square  of  the  number  of 
sample  points,  as  in  the  DFT,  to  the  number  of  sample  point  times  the  log  of  the 

N 

number  of  sample  points  (i.e.,  0(— log2N)).  The  introduction  of  the  FFT  made  many 
DSP  applications  possible  on  digital  computers. 

Prior  to  the  introduction  of  very  large  scale  integration  (VLSI),  most  DSP  computa¬ 
tions  were  performed  on-line  by  general  purpose  computers,  requiring  large  amounts  of 
time,  or  off-line  by  special  purpose  banks  of  circuit  boards  using  medium  scale,  or  small 
scale  integration  (MSI  or  LSI  respectively).  Because  the  constraints  of  throughput  and 
blocklengths  were  so  high,  real-time  computation  was  not  possible.  The  increased  speed 
and  density  now  available  in  VLSI,  will  allow  certain  DSP  functions  to  be  implemented 
on  a  single  chip. 

One  way  to  further  increase  the  speed  of  the  DFT  is  to  reduce  the  number  of  multi¬ 
plications  since  these  are  the  most  consuming.  In  1978,  Winograd  showed  a  way  to 
reduce  the  number  of  multiplications  of  a  DFT  while  keeping  the  number  of  additions 
approximately  equal.  He  also  proved  that  his  class  of  algorithms,  known  as  Winograd 
Fourier  Transform  Algorithms  (WFTs),  contain  the  minimum  number  of  multiplication 
required  for  computation  of  a  DFT  [Win78j.  Winograd ’s  large  algorithms,  however,  do 
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not  easily  map  into  VLSI  because  of  their  size.  But,  when  the  small  algorithms  are  com¬ 
bined  with  the  Good-Thomas  Prime  Factor  Algorithm  (PFA),  they  easily  map  into  VLSI 
due  to  their  small  size  and  regularity  [Lin84]. 

In  1985,  a  4080-point  transform  was  chosen  as  a  representative  system  by  the  AFIT 
VLSI  Design  Group.  The  system  consists  of  three  WFTs  of  lengths  16,  15,  and  17  pipe¬ 
lined  together  with  interconnecting  memory  modules  and  a  PFA  controller  The  block 
diagram  of  the  system  is  shown  in  Figure  1.  The  goal  of  the  WFT-PFA  project  is  to 
produce  a  real-time  signal  processing  system.  With  the  PFA  operating  at  20MHz  and 
each  of  the  WFTs  operating  at  80MHz,  one  4080-point  transform  will  be  computed  every 
120  seconds. 

1.2.  Problem  Statement 

The  first  problem  of  this  thesis  effort  will  be  to  design  the  memory  modules  and 
implement  one  in  3-micron  CMOS.  The  second  problem  will  be  to  design  the  Prime  Fac¬ 
tor  Algorithm  controller  chip  and  implement  it  in  3-micron  CMOS. 

1.3.  Scope 

This  thesis  will  include  the  design  and  implementation  of  the  interconnecting 
memory  modules  and  the  PFA  controller.  First,  the  memory  module  will  be  architectur¬ 
ally  specified,  layed  out  in  VLSI,  and  sent  for  fabrication;  second,  the  PFA  controller  will 
be  architecturally  specified,  laid  out  in  VLSI,  and  sent  for  fabrication;  third,  both  chips 
will  be  tested  to  ensure  proper  operation  and  validation. 


.<7- 


1.4.  Approach 

This  thesis  will  include  the  design  and  implementation  of  the  interconnecting 


memory  modules  and  the  PFA  controller.  First,  the  problem  requirements  and  their 
impact  on  the  design  will  be  analyzed.  Second,  the  architectural  descriptions  for  both 
chip  will  be  developed.  Third,  the  architectural  description  will  be  translated  into  gate- 
level  descriptions  and  from  gate-level  descriptions  into  VLSI  Fourth,  the  completed 
designs  will  be  simulated  to  verify  the  design  Fifth,  the  simulated  designs  will  be  sent 
for  fabrication  through  MOSIS. 
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1.5.  Summary  of  Current  Knowledge 

1.5.1.  Digital  Signal  Processing.  The  current  state  of  fast  signal  processing 
algorithms  was  forged  in  1965  by  Cooley  and  Tukey  ( Coo65 ..  They  demonstrated  a 
method  for  computing  n-point  discrete  Fourier  transforms  that  required  on  the  order  of 
0(NlogN)  computations  instead  of  0(N2).  Their  algorithm  took  advantage  of  the  sym¬ 
metry  and  periodicity  inherent  within  DFTs  to  reduce  the  number  of  operations  This 
method,  known  as  the  Fast  Fourier  Transform  or  FFT,  brought  signal  processing  to  digi¬ 
tal  computer  computation. 

Another  important  contribution  to  the  field  was  Winograd's  work  published  in  1978 
Win78  .  Winograd  presented  algorithms  that  significantly  reduced  the  number  of  multi¬ 
plications  required  while  keeping  the  number  of  additions  approximately  equal  Wino¬ 
grad  also  showed  that  his  method  required  a  minimal  number  of  multiplications. 
Because  multiplications  were  much  more  computationally  intensive  than  additions,  his 
method  significantly  reduced  the  computation  time 

For  a  VLSI  implementation,  however,  Winograd's  algorithm  lacked  the  modularity 
needed  for  effective  VLSI  design  and  the  number  of  additions  grew  quickly  as  the 
transform  size  increased  [Lin84j .  One  solution  to  this  problem  was  given  by  Burrus  who 
combined  the  Good-Thomas  Algorithm  (PFA)  Goo71  with  small  Winograd  transforms 
!Bur83\  This  combination  took  one-dimensional  transforms  and  broke  them  into  smaller 
multj-dimensional  transforms.  Finally,  Linderman  !Lin84  presented  a  way  to  embed  the 
PFA  into  VLSI  using  Winograd  Fourier  Transform  Algorithm  (WFT)  processors  His 
solution  consisted  of  decomposing  a  4080-point  transform  into  three  Winograd  processors 
of  lengths  15,  16,  and  17. 


In  1985,  four  theses  were  dedicated  to  the  implementation  of  Linderman’s  solution. 
Taylor  presented  the  PFA  and  WFT  theory,  the  overall  signal  processing  architecture, 
and  the  numerical  precision  results  [Tay85j.  He  showed  that  architecture  would  indeed 
compute  a  DFT  and  remain  within  acceptable  numerical  accuracy.  Coutee  described  the 
arithmetic  circuitry  for  the  WFT  16-point  transform  processor  [Cou85].  Rossbach 
presented  the  control  circuitry  for  the  WFT  16-point  [Ros85],  He  was  able  to  demon¬ 
strate  that  the  control  sequencer  operated  correctly  at  speeds  in  excess  of  60  MHz  in  3- 
CMOS  Finally,  Collins  presented  a  description  and  validation  of  the  WFT  16-point  in 
the  VHSIC  (Very  High  Speed  Integrated  Circuit)  Hardware  Description  Language,  VHDL 
[Col85  \ 

In  1986,  two  theses  were  involved  in  the  PFA-WFT  project  [Shc86],  Shephard  com¬ 
pleted  the  VLSI  design  and  implementation  of  the  WFT16  chip  and  Hedrick  discussed 
the  memory  modules  and  the  PFA  controller.  Hedrick  established  the  foundations  of  the 
PFA-ASP  by  describing  the  major  functional  blocks  and  their  interfaces. 

1.6.2.  Memory  Techniques.  Weste  and  Eshragian  classified  memories  into 
three  types,  random  access,  serial  access,  and  content  addressable  [Wes85]  Random 
access  memory,  at  the  chip  level,  was  described  as  having  an  access  time  independent  of 
location.  This  contrasts  with  serial  access  and  content  addressable  access  where  the  time 
needed  to  read  or  write  a  value  was  variable  depending  on  location.  Random  access 
memories  (RAMs)  may  be  further  classified  into  read  only  (ROM)  or  read  and  write  (usu¬ 
ally  referred  to  as  RAM)  ROMs  are  generally  denser  than  RAMs  but  they  are  not  as 
flexible  due  to  the  permanency  of  the  data 

RAMs  and  ROMs  can  be  further  be  divided  according  to  whether  they  are  static  or 
dynamic.  In  a  static  memory,  the  value  of  the  data  is  stored  with  some  type  of  latch, 


whereas  in  a  dynamic  memory,  the  value  is  stored  with  some  type  of  charge  on  a  capaci¬ 
tor.  Because  the  dynamic  memory  is  stored  on  a  capacitor,  its  charge  will  degrade  and 
must  be  refreshed  within  a  certain  time  interval.  The  advantage  of  dynamic  memories  is 
that  they  are  much  smaller  than  their  static  counterparts  and  use  less  power  [Muk86j. 
Dynamic  memories  are  smaller  because  there  is  no  feedback  circuitry  within  the  memory 
cell  to  keep  the  location  refreshed.  The  disadvantages  are  that  they  are  hard  to  design, 
somewhat  slower  than  static  memories,  and  more  suspectable  to  soft  errors  (errors  caused 
by  transient  radiation). 

Several  authors  described  a  dynamic  memory  consisting  of  a  single  transistor 
[Gla85,  Nluk86,  Wes85],  In  this  type  of  memory  cell,  the  charge  was  stored  on  a  capacitor 
and  a  single  transistor  acted  as  a  gate  between  the  bit  line  and  the  capacitor.  The  value 
was  written  and  read  via  the  bit  line.  Various  techniques  for  making  the  capacitor  have 
been  implemented  including  a  double  polysilicon  method  and  a  trench  method  [Rid79j . 
These  methods  tried  to  increase  the  amount  of  capacitance  in  the  memory  cell  by  exploit¬ 
ing  non-standard  techniques. 

The  most  difficult  part  of  a  one- transistor  memory  cell  is  the  sense  amplifier  When 
a  cell  is  read,  the  resultant  value  on  the  bit  line  is  determined  by  charge  sharing  between 
the  bit  line  and  the  storage  capacitor.  Because  the  storage  capacitor  is  so  small  com¬ 
pared  to  the  bit  line,  when  the  cell  is  read,  there  is  little  voltage  change  in  the  bit  line. 
To  detect  this  small  change,  a  very  sensitive  amplifier  is  needed.  A  sense  amplifier  will 
amplify  the  change  to  levels  that  correspond  to  digital  logic  values. 

In  1985,  Shinn  designed  a  double  stage  differential  amplifier  with  current  mirror 
active  loading  [Shi85j.  He  detected  a  differential  of  0.01  volts  when  reading  the  value 
and  its  complement.  Grebene  fully  described  the  transfer  characteristics  and  gain  of  such 


an  amplifier  in  1984  [Gre84].  He  found  that  the  gain  was  directly  proportional  to  the 
current  loading  and  the  width  to  length  ratio  of  the  gate  on  the  nMOS  devices  to  which 
the  voltages  to  be  sensed  were  connected. 

1.5.3.  Error  Control  Coding.  In  the  transmission  or  storage  of  data,  noise  or 
other  factors  may  cause  erroneous  results.  In  1949,  Shannon  demonstrated  that  a  proper 
encoding  of  the  information  could  reduce  the  number  of  jrrors  induced  by  a  channel 
[Sha49J.  Peterson  and  Weldon  pointed  out  that  as  early  as  1956,  systems  were  being 
built  that  demonstrated  error  correction  and  detection  [Pet72j. 

A  typical  storage  system  with  error  correction  and  detection  consists  of  encoding 
the  data  before  storage  or  transmission,  the  storing  or  transmission  media,  and  decoding 
to  output  the  data  [Hed86].  With  this  kind  of  system,  two  different  types  of  codes  are 
popular,  convolution  codes  and  block  codes.  In  a  convolution  code,  the  encoded  data  is 
based  on  the  current  data  as  well  as  previous  data,  thus  requiring  a  storage  media  associ¬ 
ated  the  encoder.  In  a  block  code,  the  encoded  data  only  depends  on  the  current  infor¬ 
mation  [Hed86j. 

Since  digital  computers  deal  with  information  coded  in  binary  digits,  the  discussion 
of  block  codes  can  be  limited  to  those  with  two  symbols  [Lin83j.  Lin  and  Costello 
developed  algorithms  for  block  codes  of  this  type  using  syndromes  and  standard  arrays. 
Their  algorithms  will  accomplish  single  error  correction  and  double  error  detection. 

1.0.  Materials  and  Equipment 

The  materials  and  equipment  needed  for  this  thesis  are  available  at  AFIT  All  the 
computer  aided  design  (CAD)  tools  require  a  UNIX  environment.  The  tools  from  the 
AFIT/VLSI  CAD  system  will  include  CSTAT,  a  tool  that  determines  whether  nodes  can 


be  set  to  logic  1,  set  to  logic  0,  affect  the  outputs,  or  can  be  affected  by  the  inputs, 
STOVE,  a  circuit  extraction  tool,  NOFEED,  a  tool  which  removes  feedback  paths  for 
simulation,  and  FIXROM,  a  tool  that  alters  the  XROM  for  simulation.  The  other  CAD 
tools  are  from  the  University  of  California  at  Berkeley.  These  tools  are  distributed  each 
year  to  various  academic  institutions.  The  tools  necessary  for  this  thesis  include  MAGIC, 
a  VLSI  layout  tool,  MEXTRA,  a  circuit  extraction  tool,  ESIM,  a  switch  level  simulator, 
CRYSTAL,  a  performance  analyzer  for  VLSI  circuits,  SPICE,  a  timing  analysis  tool,  and 
CIFPLOT,  a  tool  to  plot  VLSI  circuits.  MAGIC  requires  either  a  SUN  Workstation  or 
an  AED  767  graphics  terminal,  both  of  which  are  currently  available. 

This  thesis  will  also  require  the  use  of  the  VHDL  language.  This  language  is  sup¬ 
ported  on  AFIT's  Classroom  Support  Computer  (CSC)  operating  under  the  VAX/VMS 
operating  system. 

A  high-speed  VLSI  chip  tester  will  also  be  needed.  The  tester  is  located  in  building 
125  Area  B. 

1.7.  Sequence  of  Presentation 

Chapter  2  gives  a  detailed  analysis  of  the  requirements  needed  for  the  two  problems 
of  this  thesis.  First  the  memory  is  discussed  and  then  the  PFA  controller. 

Chapter  3  presents  the  architectural  descriptions  for  the  memory  chip  and  the  PFA 
controller  chip.  It  also  discusses  the  algorithms  involved  in  the  error  control  coding  for 
the  memory  and  the  algorithms  involved  in  the  operation  of  the  PFA  controller. 

Chapter  4  discusses  the  AFIT  CAD  environment  and  the  VLSI  design  methodology. 
This  Chapter  also  discusses  development  and  operation  of  the  Generic  Microcode  Assem- 


Chapter  5  presents  the  VLSI  description  of  the  circuits  used  to  implement  the 
memory  and  the  PFA  controller.  The  development  of  the  microcode  is  also  presented 

Chapter  6  discusses  the  results  of  the  thesis  effort  The  features  of  the  memory  chip 
as  well  as  the  PFA  controller  will  be  discussed.  This  chapter  will  also  discuss  the  results 
of  the  microcode  assembler  on  this  thesis  effort  and  two  other  applications 

Chapter  7  discusses  the  conclusions  from  this  work  and  presents  recommendations 
for  future  work.  The  future  work  will  focus  on  testing  the  parts  of  the  pipeline  and 
implementing  the  prototype. 


CHAPTER  2 


Detailed  Analysis  of  the  Problem 


2.1.  Overview 

The  scope  of  this  thesis  effort  is  to  design  the  memory  modules  and  the  PFA  con¬ 
troller  chip  The  memory  modules  are  used  as  a  buffer  between  the  different  stages  of 
the  WFT-PFA  pipeline  They  allow  concurrent  reading  and  writing  so  that  the  WFT 
processors  may  operate  asynchronously  with  respect  to  each  other  and  keep  data  flowing 
through  the  pipeline  as  it  is  needed  and  generated.  The  memory  also  encorporates  single 
error  correction  and  double  error  detection  based  on  an  (n, k)  linear  systematic  block  code 
to  correct  some  simple  errors  and  provide  some  error  monitoring  for  the  PFA  controller. 

The  PFA  controller  operates  and  monitors  the  pipeline.  It  is  responsible  for  the 
pipeline  data  flow,  fault  monitoring  and  reconfiguration,  if  necessary,  and  communication 
with  the  output  host.  The  PFA  controller  is  considered  to  be  an  application  specific  pro¬ 
cessor  containing  elements  to  store  and  manipulate  data,  sequence  through  a  set  of 
predetermined  states,  and  communicate  with  outside  activities. 

2.2.  Memory 

There  are  six  major  areas  in  the  memory  design  These  are  the  interfaces,  the  data 
flow,  the  storage  cell,  reading  and  writing,  address  selection,  and  error  control  coding 
The  memory  act  as  buffers  for  the  data  as  it  travels  through  the  pipeline  Thus,  there  are 
predetermined  interfaces  which  the  memories  must  conform  to  as  set  by  the  previously 
designed  WFT  processors 


2.2.1.  Interfaces.  The  first  constraints  imposed  on  the  memory  chip  are 
those  of  the  external  interfaces.  The  PFA  controller  interfaces  with  the  memory  via  a 
LEFT(RIGHT)  signal.  This  signal  determines  which  side  of  the  memory  is  written  to  and 
which  side  is  read  from.  Additionally,  the  memory  is  required  to  send  two  signals  to  the 
PFA  controller  for  error  monitoring.  The  first  signal  is  the  Error  Control  Code  Correct¬ 
able  (ECCC)  and  the  second  is  the  Error  Control  Code  Uncorrectable  (ECCU).  The 
ECCC  signal  flags  the  PFA  controller  that  a  single  error  occurred  and  was  corrected. 
The  EC'CC  signal  flags  the  PFA  controller  that  a  double  error  was  detected  and  therefore 
the  data  could  be  flawed  These  signals  will  be  discussed  more  in  Chapter  5.  The  WFT 
processors  interface  with  several  groups  of  signals,  the  address  select  lines,  the  input  data 
lines,  the  output  data  lines,  and  PRECHARGE  There  are  12  address  select  lines  capable 
of  addressing  up  to  4096  words  Each  word  is  24  bits  long,  giving  24  input  data  lines  and 
24  output  data  lines  The  WFT  processor  on  the  left,  or  host  in  the  case  of  the  first 
memory,  feeds  the  12  write  select  addresses  and  the  input  data  to  the  chip  and  the  WTT 
processor  on  the  right,  or  host  in  the  case  of  the  last  memory,  feeds  the  12  read  select 
addresses  and  receives  the  output  data 

2.2.2.  Data  Flow.  The  b  asic  data  flow  through  the  chip  is  as  follows;  for  the 
input  data,  the  inputs  come  into  the  chip,  pass  through  the  ECC  encoding  and  are  writ¬ 
ten  into  the  memory;  for  the  output  data,  the  values  from  the  memory  are  passed 
through  the  ECC  decoding  and  then  sent  off-chip  to  the  WTT  processor.  To  include 
concurrent  reading  and  writing  by  two  processors,  two  banks  of  the  memory  must  exist 
so  that  while  one  bank  is  written  the  other  bank  is  read.  To  accomplish  this,  the 
memory  chip  must  be  able  to  route  data  from  the  encoding  circuitry  to  either  side  of  the 
memory  and  from  either  side  of  the  memory  to  the  decoding  circuitry.  Additionally,  the 
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PRECHARGE  signal  from  the  processors  must  also  be  routed  to  the  side  of  the  memory 
the  respective  processor  is  using. 

2.2.8.  Memory  Cell.  The  memory  cell  holds  the  value  of  the  input  data  for 
later  retrieval.  Ideally,  the  node  should  be  able  to  store  data  for  more  than  a  mil¬ 
lisecond.  Two  types  of  memory  cells  can  be  used,  static  or  dynamic.  In  a  static  memory 
cell,  loss  of  data  due  to  leakage  is  not  a  problem.  Static  memories,  however,  are  large 
compared  to  dynamic  memories.  Additionally,  dynamic  memories  with  no  refresh  are 
much  more  dense  and  less  complicated  that  static  memories.  Data  must  be  read  within  a 
certain  time  interval  or  it  is  lost  without  refreshing.  One  of  the  main  results  of  this  thesis 
will  be  to  determine  whether  the  memory  cell  will  indeed  hold  the  value  within  the  time 
requirements. 


2.2.4.  Reading  and  Writing.  The  WFT  processors  operate  at  a  clock  fre¬ 
quency  of  80MHz  outputting  a  new  word  every  other  clock  cycle.  This  means  that  a  com¬ 
plete  read  or  write  must  be  accomplished  at  40  MHz,  or  one  operation  every  25ns,  for  an 
input/output  bandwidth  of  9.6xl08  bits  per  second  or  total  bandwidth  of  1.92x10®  bits 
per  second. 

A  complete  write  includes  inputting,  encoding,  address  selection,  and  value  storage. 
A  complete  read  includes  address  selection,  value  sensing,  decoding,  and  outputting. 

2.2.6.  Address  Selection.  Each  read  or  write  must  be  able  to  access  any  one 
of  the  4096  words  in  a  non-linear  fashion.  The  order  of  accesses  is  determined  by  the 
Prime  Factor  algorithms  used  to  compute  the  Winograd  Fourier  Transform.  Address 
selection  is  also  included  in  the  25ns  access  time.  Several  different  approaches  to  address 


selection  will  be  discussed  in  Chapter  5. 


2.2.6.  Error  Control  Coding.  As  described  in  Chapter  1,  the  Error  Control 
Coding  is  used  to  provide  error  correction  and  detection  The  ECC  included  on  the 
memory  chip  must  be  able  to  provide  single  error  correction  double  error  detection  for 
the  24  bits  of  data  The  ECC  to  implement  this  consists  of  four  functional  blocks  The 
first  block  is  used  to  encode  the  data  before  it  is  written  into  the  storage  array  The 
input  data  and  the  parity  bits  are  then  written  in  to  the  memory  The  second  block  is 
used  to  decode  the  output  data  into  the  syndrome  bits  The  third  block  takes  the  syn¬ 
drome  bits,  computes  the  error  vectors  and  generates  the  EC'CC  and  ECCl  signal  dis¬ 
cussed  earlier.  The  fourth  block  takes  the  error  vectors  and  the  output  data  and  gen¬ 
erates  the  data  output  to  the  VVFT  processors 

2.3.  PFA  Controller 

As  stated  earlier,  the  PFA  controller  must  be  able  to  process  instructions  and  take 
appropriate  action  on  conditional  data  to  accomplish  its  three  major  tasks  The  first  of 
these  tasks  is  the  ability  to  sequence  through  a  set  of  predetermined  control  states,  the 
second  is  the  ability  to  store  and  manipulate  data,  and  the  third  is  the  ability  to  com¬ 
municate  with  off-chip  activities.  The  requirements  for  each  of  these  major  tasks  will  be 
described  below. 

2.3.1.  Control  State  Sequencing.  To  control  the  total  state  of  the  system, 
the  controller  must  be  able  to  sequence  through  a  set  of  predetermined  states  These 
states  control  the  functioning  of  the  pipeline,  the  storing  and  manipulation  of  the  data, 
and  the  pipeline  configuration  These  states  are  expressed  through  the  use  of  microin¬ 
structions.  The  microinstructions  are  stored  in  a  read-only  memory  for  execution  The 
control  sequencer  determines  which  of  the  microinstructions  will  be  executed  next  based 


on  the  current  microinstruction.  The  source  of  the  next  microinstruction  can  either  be  an 
external  address,  the  next  sequential  address,  a  field  from  the  current  instruction,  or  from 
the  top  of  the  stack.  The  sequencer  determines  the  source  from  a  set  of  input  conditions 
and  certain  fields  from  the  current  microinstruction. 

2.3.2.  Storing  and  Manipulating  Data.  Subject  to  the  control  states,  the 
PFA  controller  must  be  able  to  store  and  manipulate  data.  The  controller  must  be  able 
to  store  data  which  is  passed  from  one  stage  of  the  pipeline  to  another  after  each  DFT 
computation.  It  must  also  store  information  regarding  pipeline  configuration,  catas¬ 
trophic  pipeline  failure,  memory  faults,  WFT  processor  faults,  error  location,  and  indivi¬ 
dual  problem  status. 

The  information  that  must  pass  from  one  stage  of  the  pipeline  to  the  other  is  the 
scale  factor  The  scale  factor  is  a  3-bit  number  indicating  the  number  of  scale  bit  on  the 
data  words.  A  3-bit  scale  factor  is  input  along  with  the  data  to  each  WFT  processor. 
After  the  WFT  has  completed  the  DFT  it  passes  a  new  3-bit  scale  factor.  This  scale  fac¬ 
tor  is  passed  along  to  the  next  WFT  processor  in  the  pipeline.  The  total  number  of  scale 
bits  is  accumulated  in  the  PFA  controller  for  each  problem  as  it  travels  through  the  pipe¬ 
line  Thus,  the  PFA  controller  must  be  able  to  store  a  5  bit  number  for  each  problem  at 
worst  case  (111  -  111  111  =  10101 ). 

In  addition  to  storing  the  data,  the  controller  must  also  have  the  ability  to  manipu¬ 
late  it  as  well.  Information  from  one  source  may  need  to  be  transferred  to  other  sources 
or  used  in  determining  future  controller  states.  This  includes  loading  values  from  the 
pads,  driving  values  to  the  pads,  shifting  a  storage  location,  comparing  two  storage  locar 
tions,  and  adding  storage  locations  For  example,  it  will  be  necessary  to  increment  the 
number  of  faults  stored  for  a  particular  processor  each  time  it  fails. 
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2.3.3.  Communication.  The  controller  must  be  able  to  interface  with  the 
pipeline  it  operates  and  the  host  which  submits  DFTs  for  computation  and  receives  the 
results.  The  following  is  a  list  of  those  signals  which  the  PFA  controller  must  be  able  to 
input  or  output: 

Memory  Chips: 

1.  FLIP  ( from  controller )  -  Used  to  determine  which  side  of  the  memory  is  read  from 
or  written  to.  One  signal  for  all  memory  chips. 

2.  ECCC  (to  controller),  Error  Control  Code  Corrected  -  Used  to  indicate  that  a  sin¬ 
gle  error  occurred  in  a  read  operation  from  the  memory  and  it  was  corrected.  Each 
memory  has  its  own  ECCC. 

3.  ECCU  (to  controller),  Error  Control  Code  Uncorrected  -  Used  to  indicate  that  an 
error  occurred  on  a  read  that  could  not  be  corrected.  Each  memory  has  its  own 

ECCU. 

WFT  processors: 

1.  WFTop  (from  controller),  WFT  Operate  -  Used  to  put  the  WFTs  processors  into 
computation  mode.  All  WFT  processors  share  one  WFTop  signal. 

2.  WD  (from  controller)  -  Used  to  put  the  WFTs  in  the  operational  or  watchdog 
mode.  Each  pipeline  stage  has  its  own  WD  signal.  (The  signals  described  here  in  2, 

3.  and  4  share  the  state/scale  bus.  This  will  be  explained  in  Chapter  5) 

3.  SZO  and  SZl  (from  controller),  SizeO  and  Sizel  -  Used  to  send  the  DFT  size  to  the 
WFTs.  Each  pipeline  stage  has  its  own  SZO  and  SZl. 

4  SCO,  SCI,  SCS  ( bi-directional ),  Scale  Factor  0,1,2  -  Used  to  send  the  input  data 
scaling  factors  and  receive  the  output  data  scaling  factors.  Each  pipeline  stage  has 
its  own  SCO,  SCI,  SCS. 

5.  PE  (to  controller),  Parity  Error  -  Used  by  the  WFTs  to  signal  that  a  parity  error 
exists  in  the  input  data.  Each  pipeline  stage  has  one  PE  signal. 

6.  WDerr  (to  controller),  Watchdog  Error  -  Used  by  the  WFTs  to  signal  that  a 
Watchdog  Error  has  occurred.  Each  WFT  processor  has  a  M  Derr  signal. 

7.  WFTdone  (to  controller),  WFT  Done  -  Used  to  signal  that  the  WFT  has  finished 
a  DFT.  Each  pipeline  stage  has  one  WFTdone  signal. 


8.  LOAD  {from  controller ),  WFT  Load  -  Used  to  signal  a  WTT  processor  to  receive 
information  on  operation,  scaling,  and  DFT  size.  Each  WFT  processor  has  its  own 
LOAD  signal. 

HOST  (these  signals  will  be  explained  in  Chapter  5): 

1.  PFAop  {to  controller),  PFA  Operate 

2.  PFAdone  {from  controller),  PFA  Done 

3.  LOADSTUFF  {to  controller),  Load/Read  PFA 

4.  Hs4,HsS,Hs2,Hsl,HsO  {to  controller),  Storage  Select 

5.  H15-H0  {from  controller),  Internal  Data  Output  Lines 


CHAPTER  3 


Architecture  and  Algorithms 


3.1.  Overview 

This  chapter  discuss  the  architectures  and  algorithms  used  in  the  memory  and  the 
PFA  controller  First,  the  memory  architectures  will  be  presented  and  then  algorithms 
for  the  error  control  coding  will  be  presented.  Second,  the  general  architecture  for  the 
PFA  controller  will  be  discussed  then  the  algorithms  used  to  operate  the  controller  and 
the  WTT  processors  will  be  developed,  and  finally,  the  resulting  architecture  will  be 
shown. 

3.2.  Memory 

The  memory  can  be  broken  into  three  main  sections,  data  flow,  storage,  and  error 
control  coding  The  data  flow  section  deals  with  controlling  the  flow  of  data  through  the 
chip  The  storage  section  discusses  the  actual  storage  cells;  how  they  are  selected,  written 
to,  or  read  from  Finally,  the  error  control  coding  explains  how  the  single  error 
correction  double  error  detection  is  accomplished 

3.2.1.  Data  Flow.  As  previously  mentioned,  the  memory  must  be  partitioned 
into  two  halves  to  allow  concurrent  reading  and  writing  This  partitioning  is  shown  in 
Fig  2  The  data  initially  enters  into  the  memory  chip  from  the  input  pads,  from  there 
the  data  is  passed  through  the  input  ECC  circuitry  and  then  to  either  memory  side  A 
control  signal  determines  to  which  side  of  the  memory  the  data  is  written  After  a  side 
of  the  memory  is  read,  also  determined  by  the  control  signal,  the  values  output  by  the 


Inputs 


Outputs 

Figure  2  Memory  Chip  Partionmg 


memory  arrays  pass  through  the  output  ECC  circuitry  and  then  off-chip  through  the 
output  pads. 


3.2.2.  Storage.  The  storage  area  consists  of  several  units  the  word  select, 
the  memory  cell,  and  the  sense  amplifier  The  word  select  unit  inputs  the  address  lines 
and  determines  which  of  the  4096  words  will  be  selected  for  the  read  or  write  operation 
A  memory  word  consists  of  32  storage  cells  activated  for  reading  or  writing  h>  a  single 
word  select  line  When  the  value  is  read,  the  sense  unit  will  detect  the  stored  merten 
value  and  amplify  it  for  further  use  in  the  chip 


3.2.3.  Error  Control  Coding.  Error  Control  Coding  is  accomplished  in  tw. 
steps  The  first  step  to  encode  the  input  data  and  the  second  step  is  it.  decode  the  i.it.i 
from  the  memory  before  sending  off-chip  The  next  two  sections  outline  the  .level  pment 
of  a  systemetic  block  code  described  by  i.in  and  Costello  I,mV$  I  he  matri'ie-  I  r  » h • 
code  were  developed  by  Major  Prescot  in  198a 
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3. 2. 3.1.  Encoding  For  this  memory,  a  message  is  defined  as  an  input  word  of 
24  bits  and  denoted  by  u  where  u=^u0,  u2.  ■  ■.  uzj)-  With  24  bits,  the  maximum  number 
of  distinct  messages  is  224  or  1.68xl07.  The  encoder  transforms  u  into  an  n-tuple,  v, 
where  n>24.  The  mapping  of  u  to  v  is  one  to  one  and  the  set  of  all  vectors  v  is  the 
block  code.  Additionally,  the  2k  code  words  (k  =  24)  form  a  fc-dimensional  subspace  of 
the  vector  space  of  all  the  n-tuples  over  the  Galois  field  GF(2).  Twenty-four  linearly 

independent  code  words,  g0,g| . g23.  can  be  found  from  the  block  code  such  that 

every  code  word  in  the  block  code  is  a  linear  combination  of  the  24  code  words  The 
resulting  relationship  between  u  and  v  is, 


V  — UogO+«lgl+  '  +  «23g23 


The  21  linearly  independent  code  words  can  be  arranged  into  a  (24  x  n)  matrix,  G.  so 


0  o,o  ?o.i 
01.0  01.1 


00,  n  —  1 
0 1 ,  n  —  1 


023.0  0  23.1 


023,-1-1 


G  is  considered  to  be  the  generator  matrix  for  the  linear  block  code  If 
u  =  I  un  ii  -  ,  ii.nl  is  the  word  to  be  encoded,  then  the  code  word.  v.  is  given  as: 


■  .  .•-»  ir« ir» w-v l-x  iv.v'.' v ’ v  nr  sr ar. H,* an h.t  v s.p  V  v_n  xwx  «u»vpu  *u«  u wj  w  vj »-L' v; >rj «v wv wvvvwvwvv- "r tv*  -\r nr. rv 


=  («0,«1.  ■  • 


go 

gl 


w  23} 


g23 


(3.3) 


—  “0g0+U!gi+  u23g23 


A  variation  on  the  linear  block  code  is  the  linear  systematic  block  code.  A  linear 
systematic  block  code  divides  the  code  word  into  two  parts,  the  message  part  and  the 
redundant  part.  The  message  part  consists  of  the  original  input  bits  and  the  redundant 
part  consists  of  parity-check  digits  which  are  linear  sums  of  the  input  bits.  For  a  linear 
systematic  (n,  24)  code,  the  24xn  G  matrix  has  the  following  form: 


go' 

P  0.0 

Po.i 

“T3 

© 

a 

1 

8 

1 

0 

0 

gl 

Pl.O 

P  1.1 

P  I,  n  —23 

0 

1 

0 

g23 

P  23,0 

P  23. 1 

P  23.  n  —23 

0 

0 

1 

where  p,;  is  a  binary  digit.  Let  P  denote  the  first  part  of  the  generator  matrix  and  Io4 
denote  the  24x24  identity  matrix  Then  G  =  |pi24j  The  code  word  now  becomes 

v  =  (t-o.f!,  .  r,,^)  (35) 

=  («0-«l.  .  «23)  G 

The  last  24  bits  of  the  code  word  correspond  exactly  to  the  original  24  in  bits  the  input 
word  and  the  first  n-24  bits  are  the  redundant  parity-check  bits  The  equations  used  to 


compute  the  first  n-24  bits  are  called  parity-check  equations  and  are  of  the  form 


t’o  =  u0+u7+u8+u14+u16+u21  (3.7a) 

t’ |  =  Uq+U  )  ~t~Ug  ~hu  J5  +U  J7  +  U22  (3.7b) 

v2  =  Ul+ti2  +  Ug+Ul0  +  u18+u23  (3.7c) 

1)3  =  u2  +  U3  +  u8"t"0ll+u18  +  tt19  (3.7  d) 

V4  —  U3+U4+U10+U12+U  i7+u20  (3.7e) 

v5  =  U4+US+u11+u13+,/18+u21  (3.7f) 

1^6  =  «5+u6+u12  +  u14+u19+u22  (3 . 7g ) 

U7  =  «6+u7+u13  +  u15+u20"bu23  (3.7h) 


The  code  word  v  can  now  be  expressed  as: 

V  =  (u0.  •  •  ■  ,  V7.«0.  -  «23). 

where  r0,  .  .  .  ,  vj  are  from  Eqs  3.7a-h  and  Uq.  •  •  •  ,  «23  are  the  original  input  word  u. 


3.2. 3.2.  Decoding  Once  v  has  been  stored  in  the  memory  it  may  be  exposed 
to  conditions  which  cause  errors  to  appear.  Because  of  the  extra  parity  bits,  the  decoding 
scheme  will  correct  single  errors  and  detect  double  errors. 

Let  r  =  (ro.ri,  .  .  .  ,r31)  be  the  word  read  out  of  the  memory.  This  word  may  or 
may  not  be  different  from  the  word  initially  stored.  Now  let  e  be  the  vector  sum  of  the 
code  word,  v,  and  r  such  that, 

e  =  v+r  =  (e0,ei,  ....  e3l),  (3.8) 

where  e j  is  the  boolean  exclusive-or  of  r,  and  r,.  Alternately,  r  may  be  represented  as  the 

vector  sum  of  e  and  v. 
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It  is  necessary  to  introduce  another  matrix  associated  with  the  block  code  This 
(n—k)xn  matrix  H,  commonly  referred  to  as  the  parity-check  matrix,  is  defined  so  that 
any  vector  in  the  row  space  of  G  is  orthogonal  to  the  rows  of  H  and  any  vector  that  is 
orthogonal  to  the  rows  of  H  is  in  the  row  space  of  G.  Thus,  a  codeword  generated  by  G 
solves  the  equation  v  Hr  =  0  .  The  parity-check  equation  may  be  written  as  follows: 


=  [ln-*Pr 


1  0  0  0  P  0.0  P  0.0  P  1 ,0  P  23.0 

0  10  o  Po.l  Po.l  P  1.1  P23,l 

0  0  1  0  P  0,2  P  0.2  Pi, 2  p  23,2 


0  0  0  1  p Q  7  p0  ;  pi  7  P 23,7 


where  PT  represents  the  transpose  of  the  matrix  P. 

When  r  is  read,  the  decoder  computes  the  following: 

s  =  r  Hr  =  (s0, s i ,  .  .  .  ,  s7).  (3.10) 

This  equation  defines  the  syndrome,  s,  of  r.  Because  v  Hr  =  0,  r  is  a  valid  codeword  if 

and  only  if  s  =  0.  If  s  ^  0  then  the  word  read  has  been  corrupted  by  errors.  If  the  error 
vector  e  is  identical  to  a  codeword  then  e+v  represents  a  valid  code  word  and  r  Hr  =  0. 
This  condition  is  considered  an  undetectable  error  Since  there  exist  224-l  nonzero  code¬ 
words,  there  exist  the  same  number  of  undetectable  error  patterns.  The  Hr  matrix  for 
our  (32,24)  block  code  is  shown  in  appendix  A. 


The  corresponding  syndrome  digit  equations  are: 


5o  =  r0  +  r8+r15+riB-fr22+r24-l-r2g 


«i  =  r  1  -t-rg-l-rg  -4-r  17  +r23+r25+r30 


(3.11a) 

(3.11b) 


So  —  f2+r9  +r  l0+r16+r  18  +  r26  +  r31 


(3.11c) 

s3  —  r3+r  10  +  rll  -t'r17+'"i9+r2<  +rj7  (3.1  Id) 

=  r4+rn+r12  +  rl8+r20+r25  +  r28  (3. lie) 

s$  =  rs  +  rj2+r13+r194-r2i+r26+r29  (3.  Ilf) 

=  >'6  +  r13  +  rM+''20+<r22  +  ''27+r30  (3.1  lg) 

s^  =  r7+r14+r15+r2i+r23+r28+r3i  (3. 1 1  h ) 


Thus,  we  have  computed  the  syndrome  of  r  which  is  equal  to  r  Hr  But,  r  =  (v-fe) 
so  that  s  =  (v-+-e)Hr  =  v  Hr+e  Hr.  However,  v-Hr  =  0  so  that  the  following  relation¬ 
ship  is  established:  s  =  e  Hr  Because  of  the  above  relation  it  is  possible  to  compute  e 
from  s.  The  reader  is  referred  to  Chapter  3  of  Lin  and  Costello  (Lin83^  for  the  proof. 
The  error  vector,  e.  is  computed  from  the  matrix  in  Appendix  A.  Appendix  A  also  shows 
the  32  e;  equations.  Each  e;  is  a  combination  of  the  8  syndrome  digits  so  that 

e;  =  3Q'S\mS2'S$'Si'S$‘S8'Si  , 

where  sm  sn  represents  the  boolean  AND  function  and  s,  may  be  boolean  1  or  0.  For 
example,  the  syndrome  decoding  of  eg  =  So'Sr*2'S3'*4'Ss '^6'®7  >  where  s,  represents 
boolean  1  and  s,  represents  boolean  0.  From  Eq.  3.8  it  follows  that  v  =  e+r.  Therefore, 
the  approximation  of  the  original  codeword  is  obtained  by  EXCLUSIVE-ORing  the  word 
read  out  of  memory,  r,  and  the  computed  error  vector,  e. 

3.3.  PFA  Controller 

The  PFA  controller  chip  consists  two  main  architectural  parts,  the  micropro¬ 
grammed  control  unit  and  the  processor  unit.  The  microprogrammed  control  unit  is 
responsible  for  sequencing  through  predetermined  states  for  the  controller  and  initiating 
processor  actions  as  well  as  data  flow  through  the  chip.  The  processor  unit  performs  all 
the  data  manipulations  as  well  as  storing  the  data  and  sending  signals  to  the 


24 


i,  mi 


WMW-W.’  wm  v  i"  wmnr  V?  w  <r 


R 


microprogrammed  control  unit  In  order  to  make  the  architectural  description  more 
meaningful,  it  is  necessary  to  discuss  the  algorithms  associated  with  the  PFA  controller. 
In  an  application  specific  processor,  the  algorithm  drives  the  architecture. 

3.3.1.  Host  Algorithm.  The  host  must  operate  and  communicate  with  the 
PFA  controller.  Handshaking  is  kept  to  a  minimum,  optimally  only  an  operate  signal  to 
the  PFA  controller  and  a  done  signal  from  the  controller.  The  host  algorithm  is  shown  in 
Fig.  3.  Initially,  the  host  must  reset  the  controller.  This  initializes  the  storage  areas  and 
resets  all  counters  and  addresses  lines.  Now,  the  host  lowers  the  operate  signal  and  then 
lowers  the  reset  signal.  The  host  can  proceed  to  inspect  any  storage  locations  and  update 
if  needed.  Specifically,  the  host  must  specify  the  initial  configuration,  scale  factors,  size, 
and  timeout  information.  Once  the  proper  data  has  been  set,  the  operate  signal  to  the 
PFA  controller  is  raised.  Since  the  pipeline  is  set  with  data  entering  from  one  side  and 
exiting  from  the  other,  there  may  be  two  different  hosts,  an  input  host  and  an  output 
host.  The  input  host  must  load  the  new  data  into  the  first  memory  and  send  a  done  sig¬ 
nal  to  the  PFA  controller  when  all  the  data  is  loaded.  The  output  host,  at  this  time, 
must  unload  the  DFT  results  from  the  last  memory  in  the  pipeline.  From  the  register 
inspection,  the  output  host  knows  whether  the  pipeline  output  data  is  valid  or  not  If  the 
data  is  not  valid,  the  host  ignores  the  current  output  data  The  next  event  is  for  the 
current  DFT  to  be  computed,  this  is  signaled  by  the  done  signal  from  the  PFA  con¬ 
troller  The  host  may  then  continue  to  operate  the  PFA  as  long  as  needed. 

At  any  time  during  the  pipeline  operation  while  the  PFA  is  not  computing,  the  host 
may  change  the  configuration,  change  the  size  of  the  DFT  to  be  computed,  implying  a 
hardware  reconfiguration,  or  change  the  scale  factors  of  the  input  data. 
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S.S.2.  PFA  Controller  Algorithm.  The  high  level  algorithm  for  the  PFA  con¬ 
troller  is  shown  in  Fig  4.  The  first  section  deals  with  initializing  values  and  data  on  the 
chip  during  power  up.  The  controller  then  waits  for  an  operate  signal  from  the  host. 
While  the  PFA  controller  is  waiting  for  the  operate  signal,  the  host  has  access  to  the 
internal  data  stored  on  the  chip.  Once  the  controller  has  received  the  operate  signal,  it 


Initialize 


PFA  OPERATE  =  1  — 22. 


RECONFIGURE? 


RECONFIGURE 


Flip  Memories 


WFT  CODE 


DONE  = 1 


Figure  4  Controller  Microcode  Flow  Chart 


checks  to  see  if  the  pipeline  must  be  reconfigured  Reconfiguration  is  discussed  later  in 
this  chapter.  The  PFA  controller  then  toggles  the  flip  signal  sent  to  the  memories.  The 
controller  then  communicates  with  the  WFT  processors  Once  the  W’FT  computations 
are  done,  the  PFA  sends  a  done  signal  to  the  host 

The  WFT  interface  algorithm  is  shown  in  Fig  5  The  PFA  controller  must  first 
send  the  scale  factors  to  the  WFT,  then  send  the  operate  signal,  and  then  float  the  scale 
factor  lines  so  that  the  output  scale  factors  may  be  returned  by  the  WFT  processors  The 
controller  then  waits  for  four  done  signals,  one  from  the  input  host  and  one  from  each 
stage  of  WFT  processors  Once  all  have  finished,  the  PFA  controller  latches  the  new  scale 
factors  from  each  stage  and  stores  them  internally  The  controller  then  lowers  the  WFT 
operate  signal  The  controller  looks  at  information  received  from  the  pipeline  to  deter¬ 
mine  if  any  errors  have  occurred  and  updates  certain  storage  locations  if  need  be  If  an 
error  has  occurred  in  the  active  WFT  processor  the  controller  sets  up  a  new 
configuration 

3.3.3.  Microprogrammed  Control  Unit.  The  microprogrammed  control  unit 
(MCI’)  controls  the  operation  of  the  chip  It  generates  the  control  signals  that  operate 
the  pipeline,  communicate  with  the  host,  or  manipulate  the  internal  data  This  unit  ran 
be  thought  of  as  two  separate  sections  the  control  memory  and  the  microprogrammed 
sequencer  Man82  The  control  memory  is  a  read  only  memory  (ROM)  Where  words 
read  one  at  a  time,  represent  a  microinstruction  Each  microinstruction  contains  informa¬ 
tion  needed  for  the  chip  to  operate  The  instruction  format  is  broken  down  into  logical 
fields  Each  of  these  fields  defines  a  certain  set  of  operations  in  the  chip  These  fields  can 
be  horizontally  or  vertically  coded  Man82  To  achieve  a  compromise  of  the  benefits  and 
tradeoffs  of  the  two  approaches  the  PFA  microinst ruction  contains  both 


Figure  5  WFT  Interface  Microcode  Flow  Chart 


The  microprogrammed  sequencer  determines  the  next  address  to  be  read  out  of  the 
ROM.  The  next  address  can  come  from  several  sources  including  the  microinstruction 
itself,  an  external  bus,  a  stack  used  for  subroutines,  or  simply  be  the  next  address  in 
sequence.  The  microprogrammed  sequencer  determines  the  address  from  a  number  of  con¬ 
ditions  selected  by  the  microcode  instruction. 

Figure  6  shows  a  typical  MCU.  The  control  memory  contains  several  main  fields. 
Two  of  these  fields  determine  how  the  next  address  is  selected,  the  condition  and  the 
branch  fields.  The  condition  field  selects  one  of  the  conditions  used  for  branching  and 
control  that  exist  in  the  chip.  The  branch  field  selects  either  T  or  F  for  an  unconditional 
branch  or  call,  or  positive  or  negative  sense  of  the  selected  condition.  The  branch 
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Figure  6  Microprogrammed  Control  Unit 
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address  may  come  from  three  different  places,  the  instruction  itself,  the  stack,  or  an 
external  source.  The  stack  is  used  as  a  temporary  storage  location  in  the  event  of  subrou¬ 
tine  call  for  the  return  address.  When  a  call  is  executed,  the  address  of  the  next  instruc¬ 
tion  is  pushed  onto  the  top  of  the  stack.  On  a  return,  the  branch  address  is  popped  off 
the  stack  and  selected  as  the  next  address.  The  MCU  also  contains  an  incrementer  for 
sequential  addressing. 

3.3.4.  Processor  Unit.  The  processor  unit  contains  the  internal  storage  for 
the  chip  and  the  circuitry  necessary  to  manipulate  the  data.  The  PFA  controller  architec¬ 
ture  is  organized  around  a  bus  structure  and  is  shown  in  Fig.  7.  The  data  is  stored  in  a 
set  of  registers  that  can  be  loaded  or  read.  The  registers  are  loaded  through  the  C  bus  or 
from  an  XROM  field.  The  particular  register  is  selected  via  a  decoder.  The  data  may  be 
read  out  on  either  the  A  bus  or  the  B  bus.  The  A  bus  and  the  B  bus  are  fed  into  the 
arithmetic  logic  unit  (ALU).  The  ALU  can  perform  15  functions  on  the  data  and  set  con¬ 
dition  bits  accordingly.  For  a  detailed  description  of  the  ALU,  the  reader  is  referred  to 
the  thesis  of  Capt.  David  Gallagher  [Gal87j. 

3.3.5.  Registers.  The  registers  store  the  data  for  the  controller.  There  are  28 
registers  in  the  controller.  Seven  of  the  registers  have  special  purposes  requiring  such 
options  as  being  loaded  from  sources  other  than  the  load  bus.  driving  the  contents  to 
destinations  other  than  the  data  busses,  and  shifting.  The  following  defines  the  registers 
used  in  the  PFA  controller  and  any  special  requirements: 


ECC1-ECC3,  Error  Control  Corrected  registers. 

These  three  registers  are  used  as  counters  for  the  number  of  Error  Control  Code 


C  select 


C  select 


A  Bus  B  Bus 


C  bus  select 


A  bus  select 
B  bus  select 


Register  n 

A  select 

C  select 

B  select 

Register  n-1 

A  select 

C  select 

B  select 

Register  3 

A  select 

B  select 

Register  2 

A  select 

B  select 

Register  1 

A  select 

B  select 

Arithmetic  Logic  Unit 
(ALL) 


(from  XROM 


Figure  7  Processor  I'nit 


corrected  from  the  first  three  memories.  These  registers  can  be  used  for  fault  moni¬ 
toring. 

ECU1-ECU3,  Error  Control  Uncorrected  registers. 

These  three  registers  aT  used  as  counters  for  the  Error  Control  Code  uncorrected 
from  the  first  three  memories.  These  registers  can  also  be  used  for  fault  monitoring 

PE1-PE3,  Parity  Error  registers. 

These  three  registers  are  used  as  counters  for  the  number  of  Parity  Errors  signaled 
by  each  stage  of  the  pipeline. 


WT>  1 1-WD13,VVD21-\VD23,V\D31-WD33,  W  atch  Dog  registers 

These  nine  registers  are  used  as  counters  for  the  number  of  times  the  W’FT  proces¬ 
sor  was  assigned  as  having  a  fault. 

TSR,  Temporary  Scale  Register 

This  register  is  used  to  store  the  scale  factors  to  be  given  to  each  stage  of  the  pipe¬ 
line.  Nine  of  the  bits  (3  sets  of  3)  must  be  able  to  be  sent  to  the  W’FT  processors 
The  register  must  also  be  able  to  receive  the  scale  factors  from  the  W’FT  processors 
and  then  shift  those  by  5  bit  for  the  next  stage  of  the  pipeline 

PSR.  Permanent  Scale  Register 

This  register  is  used  to  store  the  accumulating  scale  factors  for  each  of  the  three 
problems  in  the  pipeline  This  register  must  be  able  to  transfer  the  problem's  scale 
factors  to  the  respective  problem's  Problem  Status  register.  This  register  must  also 
be  able  to  shift  the  scale  factor  bit  by  5  positions 

PS1-PS3.  Problem  Status  register 

These  three  registers  store  information  about  each  of  the  three  problems  in  the 
pipeline  They  store  the  current  accumulated  scale  factor,  and  whether  there  was  a 
fault  due  to  an  active  error,  a  memory  uncorrectable  error,  or  a  parity  error  and 
wmch  stage  of  the  pipeline  the  error  occurred  The  register  also  contains  a  bit  signi¬ 
fying  whether  the  data  was  validated  by  the  processor  finishing 

COR.  Current  Configuration  Register  and  NCR,  Next  Configuration  Register 
These  two  registers  store  the  current  and  next  configurations  respectively. 

TOUT,  Timeout  Register 

This  register  contains  a  time  out  value  to  determine  whether  a  pipeline  catastrophic 
pipeline  failure  has  occurred,  such  as  a  processor  not  finishing 


TEMP,  Temporary  register 

This  register  is  used  as  a  scratch  pad  for  many  of  the  data  manipulations 

#.8.6.  Microcode  Word  Format.  Now  that  all  of  the  major  architectural 
components  have  been  identified,  the  fields  in  the  word  format  can  be  identified  Figure  8 
shows  the  architecture  of  the  PFA  controller  Each  field  in  the  microcode  will  control  the 
operation  of  an  architectural  block,  peripheral  circuit,  or  direct  output.  The  first  field 
controls  the  addresses  sequencer  for  the  XROM  It  determines  where  the  next  address  ori¬ 
ginates  The  second  field  specifies  the  operation  of  the  ALU  The  third  field  is  used  to 

indicate  an  insertion  of  the  literal  field  onto  the  destination  register  bus.  This  is  used  to 

load  a  constant  from  the  microword  into  a  register  The  fourth,  fifth,  and  sixth  fields 
select  the  two  source  registers  and  the  destination  register,  respectively,  for  the  ALL'  The 
seventh  field  is  horizontally  coded  and  specifies  certain  control  signals  for  the  chip.  The 
eighth  field  signals  the  VVFT  processors  to  operate.  The  ninth  field  signals  the  host  that 
the  DFT  is  done  Finally,  the  tenth  field  contains  the  address  of  the  branch  or  a  constant 
to  be  inserted  onto  the  destination  bus.  The  bit  fields  will  be  described  in  Chapter  4 

3.3.7.  Pipeline  Fault  Tolerance.  The  pipeline  is  set  up  for  fault  tolerance 
using  triple  redundancy  and  voting  for  the  WFT  processors  [Hed86i.  In  each  stage,  one  of 

the  three  processors  is  considered  to  be  active,  the  other  two  are  considered  to  be  in 

watchdog  mode.  When  a  processor  operates  in  watchdog  mode,  it  receives  the  same  data 
as  the  active  processor,  computes  the  same  transform  and  compares  its  results  with  the 
results  of  the  active  at  the  output  pads  without  driving  the  pipeline  data  bus.  If  there  is 
a  discrepancy,  a  WatchDog  error  bit  is  signaled  to  the  PFA  controller.  The  PFA  con¬ 
troller  looks  at  the  three  WatchDog  error  bits  from  each  of  the  processors  in  each  stage 


and  uses  a  voting  strategy  to  assign  the  error,  if  any  one  of  the  WatchPog  error  bit  is 
raised.  The  voting  strategy  is  as  follows: 


Table  1:  Voting  Strategy 


The  (1,X,X)  cases  are  conditions  that  are  treated  as  normal  because  the  possibility  of  the 
active  signaling  a  watchdog  error  is  too  small  to  warrant  the  amount  of  microcode  and 
hardware  necessary  for  handling  this  condition.  The  conditions  to  cause  a  (1,X,X)  case 
would  be  for  the  active  processor  to  signal  that  it  had  a  watchdog  error.  This  can  only 
happen  when  the  WFT  has  suffered  a  major  error,  the  pipeline  has  some  kind  of  bus 
error  in  which  the  line  is  set  high,  or  a  transient  fault.  If  the  active  processor  signals  a 
watchdog  error  and  the  others  do  not,  this  error  will  be  assigned  to  the  active  as  it 
should  be.  The  problem  arises  when  the  the  active  signals  as  watchdog  error  and  one 
watchdog  also  signals  an  error.  In  this  case,  the  error  will  be  assigned  to  the  watchdog 
processor  that  did  not  signal  the  error.  It  is  difficult  to  determine  what  was  the  true 
cause  of  the  error  was.  The  error  could  be  in  the  active  or  either  watchdog  processor  and 
this  voting  strategy  will  not  find  the  fault.  In  the  event  that  all  three  signal  a  watchdog 
error,  no  error  is  assigned.  The  probability  of  this  occurring  is  extremely  small  except  in 
a  high  radiation  zone.  And,  should  it  occur,  the  data  would  probably  be  corrupted 
enough  so  that  the  next  WFT  in  the  pipeline  would  most  likely  detect  a  parity  error.  If  it 
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is  the  last  stage  in  the  pipeline  the  data  will  be  bad  with  no  error  reported. 

When  the  PFA  controller  executes  the  code  for  error  recording,  it  looks  at  the 
watchdog  errors  and  determines  if  one  of  the  three  active  processors  has  errored,  if  so  it 
sets  up  the  Next  Configuration  Register  to  make  the  next  active  processor  to  be  the 
current  watchdog  with  the  smallest  error  count.  If  the  two  watchdogs  have  the  same 
number  of  errors,  it  selects  the  processor  with  the  lowest  number  name  (ie.  WDil  < 
WD»2  <  WDi3;  where  i  represents  the  column).  The  names  assigned  to  the  processors 
are  shown  in  Fig.  9. 


CHAPTER  4 


Computer  Aided  Design  Environment  Tools 


4.1.  Overview 

The  computer  aided  design  (CAD)  environment  has  a  great  impact  on  the  quality 
and  timeliness  of  what  can  be  produced.  A  well-integrated  set  of  tools  from  the  design 
stage  through  the  implementation  stage  to  the  simulation  stage  allows  easier  transitions 
between  stages  and  quicker  execution.  AF1T  currently  supports  a  limited  number  of 
tools  for  the  design  and  implementation  phases.  In  the  following  sections,  the  methodol¬ 
ogy  for  designing  a  VLSI  chip  within  the  AFIT  CAD  environment  is  discussed,  descrip¬ 
tions  of  the  tools  needed  are  given,  and  a  description  and  the  development  of  a  CAD  tool 
created  as  a  result  of  this  thesis  effort. 

4.2.  Design  Methodology 

The  first  step  in  producing  a  VLSI  chip  to  decide  what  problem  to  solve.  A  detailed 
problem  specification  limits  the  scope  of  what  is  to  be  done  and  thus  limits  the  amount 
of  extra  design  that  might  not  be  needed.  The  next  step  is  to  develop  an  architectural 
description  from  the  problem  specification.  For  the  design  parts  of  this  thesis  effort,  the 
architectural  descriptions  are  extremely  different.  The  memory  chip  architecture  supports 
a  data  flow  architecture  and  the  PFA  controller  chip  supports  a  finite  state  machine 
architecture.  Once  the  architecture  has  been  described,  the  logic  and  circuit  design  follow. 
At  this  stage,  the  interfaces  between  the  macrocells  are  defined  as  well  as  the  incorpora¬ 
tion  of  testiblilitv .  The  next  step  is  the  VLSI  layout.  Once  the  layout  is  complete,  the 
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simulation  phase  begins.  Simulation  and  verification  start  at  the  lowest  level  of  design 
and  proceed  up  the  hierarchy.  When  discrepancies  are  noted  between  the  simulation  and 
the  descriptions,  the  chip  design  loops  back  to  various  levels  depending  on  where  the 
discrepancy  occurred  Once  the  chip  has  met  the  simulation  requirements,  it  is  ready  for 
fabrication 

4.2.1.  AFIT  CAD  Tools.  AFIT  currently  supports  the  CAD  environment 
near  the  lower  design  levels.  These  tools  currently  support  VLSI  layout  and  switch-level 
simulation  In  addition  to  the  software  tools,  AFIT  has  considerable  computing  power 
and  other  hardware  support  for  CAD.  The  hardware  used  in  this  thesis  effort  included: 

1  Two  ELXSI  6400s, 2-CPU(12MIP,6MIP)  running  UNIX  4.2BSD 

3  SUN2  and  SUN3  Workstations  running  UNIX  4.3BSD 

4  Two  VAX  11  785s  running  UNIX  4.3BSD 

5  A  VAX  11/785  running  VMS  4.5 

6  A  Versatec  Plotter,  and  an  assortment  of  printers. 

The  current  mainstay  of  the  AFIT  VLSI  CAD  toolset  is  Magic  [Ost86],  Magic  is  an 
interactive  \TSI  layout  tool  that  allows  creation  and  modification  of  VLSI  circuits  using 
Manhattan  circuit  design  geometries.  For  this  thesis  effort,  Magic  was  run  primarily  on  a 
SUN  3  because  of  the  window  environment  and  interactive  speed  for  cell  editing.  Magic 
was  run  on  the  ELXSI  6400  when  executing  several  features  of  Magic  that  are  computa¬ 
tionally  intensive  This  included  the  design  rule  checking  of  a  silicon  compiled  XROM 
(described  later)  and  hierarchal  cel!  "flattening"  of  the  same  XROM.  The  XROM  contains 
thousands  of  cells  and  was  found  to  be  easier,  and  faster,  to  work  with  the  XROM 
"flattened  " 

Mextra  [Fit83;  was  used  to  translate  from  the  Cal-Tech  Intermediate  Form  (CIF) 
to  a  switch-level  format  suitable  for  other  simulations.  Mextra  takes  the  "cif"  file,  which 


is  a  mask  level  description  of  the  circuit  produced  by  Magic,  and  makes  a  "sim"  file 
which  is  a  listing  of  all  the  transistors  in  the  circuit  and  associated  capacitances  In  addi¬ 
tion  to  the  translation,  Mextra  outputs  several  other  files.  These  include  the  alias  file, 
the  log  file,  and  the  nodes  file.  The  alias  file  is  very  useful  in  finding  nodes  that  are 
"shorted"  together  that  should  not  be.  The  log  file  gives  information  about  the  number  of 
occurrences  of  labels  that  are  not  connected  together  by  stating  that  a  certain  label  has 
A  number  of  occurrences.  The  file  provides  information  that  usually  lead  to  finding 
unwanted  "open  circuits"  between  nodes.  Finally,  the  nodes  file  is  a  list  of  the  node 
numbers  assigned  by  Mextra  and  their  location  on  the  chip.  This  is  useful  reference  when 
running  other  tools  which  refer  to  node  numbers. 

Cstat,  a  CMOS  version  of  stat  [TerSS^,  was  run  on  the  output  of  Mextra.  Cstat 
provides  information  about  nodes  that  cannot  be  affected  by  the  inputs,  cannot  affect  the 
outputs,  and  nodes  that  cannot  be  set  to  either  logic-1  or  logic-0  This  tool  is  useful  for 
finding  nodes  that  are  not  connected  or  shorted  to  either  Vdd(logic-l)  or  GND(logic-O). 
By  the  time  the  circuit  is  ready  for  fabrication,  every  node,  if  any,  signaled  by  Cstat 
should  be  accounted  for 

Nofeed  and  Fixrom  are  two  tools  needed  to  allow  Esim,  described  below,  to  work 
properly.  Nofeed  scans  the  Mextra  output  to  identify  and  remove  the  feedback  loops 
from  Master-Slave  Flip-Flops  (MSFFs)  making  them  dynamic.  A  more  complete  discus¬ 
sion  of  the  operation  of  the  MSFF  will  be  given  in  Chapter  5.  Fixrom  modifies  two  por¬ 
tions  of  the  XROM  for  Esim  compatibility  The  first  modification  is  to  replace  the  shared 
drain  in  the  XROM  storage  array  and  the  second  is  to  replace  an  inverter  in  the  XROM 
sense  amplifier  A  thorough  discussion  of  the  operation  of  the  XROM  will  be  given  in 


Chapter  5. 


Esim  (Ter86j,  is  an  event-driven  switch  simulator  for  nMOS  or  CMOS  transistor 
circuits.  Esim  is  used  to  exercise  the  switch  level  description  from  FIXROM  This  tool 
was  used  to  verify  the  logic  created  from  the  Magic  layout  for  correct  operation  Simula¬ 
tion  can  be  preformed  as  if  the  chip  is  under  test  by  stimulating  only  the  inputs  and 
observing  only  the  outputs,  or  as  a  diagnostic  tool  by  stimulating  any  node  in  the  circuit 
and  observing  any  node  in  the  circuit.  Once  a  circuit  performs  as  expected  under  Esim.  it 
is  considered  ready  for  fabrication 

Two  tools  developed  at  AFIT  were  used  to  create  the  XROM  used  in  the  control 
section.  The  first  tool  was  the  Generic  Microcode  Assembler  Tool  (GMAT).  GMAT  was 
developed  as  part  of  this  thesis  effort  and  is  described  in  the  following  section.  The 
second  tool  was  an  optimizing  XROM  silicon  compiler.  This  tool,  given  a  list  of  integer 
values  describing  the  binary  contents  of  the  ROM,  will  minimize  the  transistor  count  and 
the  number  of  drains.  The  compiler  will  also  generate  the  Magic  layout  for  the  optimized 
ROM  including  the  word  selectors,  column  drivers,  sense  amplifiers,  precharge  circuitry, 
and  cell  arrays 


4.2.2.  Generic  Microcode  Assembler  Tool  (GMAT).  When  designing  micro¬ 
code.  it  is  desirable  to  describe  the  code  in  a  structured  language  representation  using 
mnemonics  Describing  microcode  in  these  terms  is  helpful  for  two  reasons;  first,  the 
code  is  more  readable  and  second,  the  code  is  less  prone  to  errors  These  factors 
motivate  the  need  for  a  microcode  assembler.  The  CAD  environment  needs  tools  that  can 
be  applied  to  different  projects  so  the  assembler  must  be  generic 

To  achieve  the  above  needs,  a  generic  microcode  assembler  tool  was  developed  This 


tool  takes  a  microcode  word  format  description  and  mnemonic  translation  file  and  builds 
a  microcode  assembler  for  that  particular  microcode  format  The  resulting  assembler 


I 


v^v'rvry’rir»vTv»! 


reads  in  the  translation  file  and  the  microcode  to  produce  outputs  that  are  compatible 
with  other  CAD  tools  and  are  useful  in  debugging  the  integration  of  the  microcode  into 
chip  designs.  Specifically,  it  generates  an  address  stream  used  by  the  optimizing  XROM 
compiler,  a  VHDL  description  of  the  ROM,  a  reference  file,  and  a  reverse  assembly  file 

The  assembler  supports  several  important  programming  features  including  labels, 
literals,  and  default  fields.  By  supporting  the  use  of  labels,  jump  fields  may  be  specified 
by  a  label  rather  than  an  absolute  address  that  must  be  put  into  the  microcode  by  hand 
Supporting  literals  is  useful  when  loading  a  location  with  a  specified  value  from  the 
microinstruction.  Supporting  default  values  greatly  improves  code  readability  The 
microinstruction  need  only  specify  those  fields  where  some  action  is  to  take  place 

The  assembler  also  supports  several  other  features  that  are  useful  in  the  design  pro¬ 
cess.  The  assembler  can  output  a  file  that  can  be  fed  into  the  XROM  compiler  This  elim¬ 
inates  a  step  previously  needed  and  subject  to  human  error  The  assembler  also  outputs 
a  reference  file  that  shows  the  instruction  and  its  translation  This  is  particularly  helpful 
in  debugging  the  XROM  connections  in  the  chip 

4.2.3.  GMAT  Assembler.  The  assembler  created  by  CM  AT  is  essentially  a 
two-pass  assembler  as  described  by  Beck  Bec8a  On  the  first  pass  unsed  parts  are 
stripped  and  branch  labels  are  stored  for  access  on  the  second  pass  A*,  the  assembler 
scans  the  input  file  on  the  first  pass  it  writes  out  an  intermediate  "stripped  file”  ari  l  a 
listing  file  containing  the  input  file  and  associated  line  number' 

On  the  second  pass  the  translation  is  accomplished  f  irM  the  assembler  read'  the 
translation  file  into  internal  data  structures  The  translation  hie  created  b\  the  user 
defines  the  microw<>rd  format  the  fields  and  the  >  alues  ifie\  represent  I  b »-  assembler 


now  reads  the  stripped  microprogram  and  uses  the  translation  tables  for  the  symbol 
substitution  When  the  assembler  parses  a  line,  it  starts  with  the  first  symbol  encoun¬ 
tered  and  searches  through  the  list  of  names  in  the  first  field  table  If  the  symbol  is 
encountered,  it  makes  the  translation  and  outputs  the  result  to  a  data  file  then  gets  the 
next  symbol  If  a  symbol  is  not  found,  the  assembler  puts  in  the  default  value  for  the 
field  and  gets  the  next  symbol  This  way  the  writer  need  not  specify  all  fields  in  the 
microcode  This  enhances  readability  and  decreases  the  chances  of  leaving  a  field  out  if 
they  all  must  be  declared 

A  label  is  treated  like  any  other  field  When  the  label  is  encountered  in  the  word, 
the  assembler  puts  in  the  value  of  the  line  number  at  which  the  label  appeared  in  the 
hrM  pass  It  also  translates  the  line  number  to  a  1  0  representation  with  the  same 
number  of  bits  as  the  held  where  it  is  being  placed  This  allows  the  microcode  to  use 
labels  for  the  branch  fields  instead  of  having  to  figure  out  the  absolute  location  of  the 
branch  and  manually  insert  it  into  the  microcode  word  The  assembler  also  allows  inser¬ 
tion  of  literals  into  the  translation  When  a  pound  sign  is  encountered  the  field  allowing 
literals  Wi||  be  filled  with  exactly  what  follows  the  pound  sign  This  allows  constants  to 
be  used  in  the  microcode  If  the  assembler  does  not  find  a  symbol  after  parsing  the  entire 
line  two  conditions  may  have  occurred  First  the  symbol  could  have  been  a  "nop",  or 
see.  (|.f  the  symbol  is  an  error  The  "nop"  represents  a  "no  operation"  instruction  The 
.Lssetnhler  'hecks  f-  .r  a  "iiop”  at  t  h  is  point  If  the  symbol  ir  not  a  "nop"  then  it  reports 
this  ba<k  to  the  user  as  an  error  and  indicates  the  symbol  that  was  not  found 

After  the  se»  ■  »n d  pass  the  assembler  has  created  a  data  file  that  holds  the  transla¬ 


tion  for  the  entire  mt<  ro<,„j,-  I  he  assembler  now  creates  a  reference  hie  for  the  user 


This  file  lists  the  original  instruction  and  following  it,  the  translation  by  fields.  This 
reference  file  is  extremely  useful  in  debugging  the  microcode  and  the  chip  once  the  ROM 
is  in  place. 

The  remaining  features  of  the  assembler  are  invoked  as  a  command  line  switch 
when  running  the  assembler.  The  user  may  generate  a  reverse  compile,  an  XROM  address 
file,  or  a  VHDL  description  of  the  ROM.  The  reverse  compile  takes  the  file  of  translated 
symbols  and  reverse  assembles  it.  If  the  code  is  correct,  the  reverse  compile  will  produce 
the  original  microcode  with  labels  removed  and  line  numbers  inserted.  The  assembler 
may  also  generate  the  XROM  addresses.  In  this  section,  the  assembler  transforms  the 
translated  file  into  a  form  compatible  with  the  XROM  optimizing  compiler.  The  assem¬ 
bler  separates  the  word  into  four  parts  and  computes  the  integer  value  of  the  resulting 
binary  number.  These  four  integers  are  then  put  into  the  XROM  addresses  file.  The 
assembler  does  this  for  all  words  in  the  microcode.  The  final  option  currently  supported 
is  a  VHDL  description  of  the  ROM.  The  assembler  generates  a  VHDL  package  that  holds 
the  ROM  and  defines  how  the  XROM  is  interfaced.  The  interface  allows  words  to  be  read 
from  the  ROM  The  ROM  is  represented  as  an  array  of  bit  strings. 

4. 2. 3.1.  Translation  File  The  translation  file  contains  the  microcode  word 
format  and  the  field  definitions,  as  well  as  their  translation.  The  following  is  a  represen¬ 
tation  section  of  the  translation  file  used  for  the  PFA  controller: 

BR_SE1.  ALU  LOAD_FD  REG  REG  REG  SPEC_FUNCT  WFTOP  PFADONE  NXT_ADDR  # 


BR_SEL 

000000000 

RET 

000000001 

CALL 

000000010 

JMP 

000000011 

CALLCR 

000010110 

JnPE3 

111111111 

LOAD_FD 

0 

LOAD 

1 

REG 

00000 

ECCCl 

00001 

ECCC2 

00010 

The  first  line,  it  must  be  the  first  line,  contains  the  microword  format  ended  with  a 
semi-colon.  The  end-of-line  delimiter  is  used  in  case  a  format  description  is  longer  that 
one  line.  Fields  in  the  microword  format  must  appear  below  in  the  translation  file.  If  a 
follows  a  field  in  the  format  description,  this  field  might  contain  a  literal.  If  a  fol¬ 
lows  a  field,  this  field  might  contain  a  label.  In  the  above  example,  the  NXT_ADDR  field 
contains  both.  For  the  PFA  controller  this  field  is  the  jump  field  as  well  as  the  literal 
field. 

The  field  definitions  that  follow  the  format  description  are  separated  by  one  blank 
line  and  end  with  the  last  field.  The  first  line  of  a  field  definition  consists  of  the  field 
name  followed  by  its  default  value.  Blank  spaces  are  used  to  separate  the  values,  not 
tabs.  The  remaining  lines  in  a  field  definition  specify  the  sub  fields  and  their  translation. 
In  the  above  example,  the  REG  field  default  value  is  0000  and  the  value  of  register 
ECCC’l  is  0001.  When  the  assembler  encounters  the  symbol  ECCCl,  it  will  place  0001  in 
the  translated  file. 
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4.2.4.  PREG  Operation.  PREG  is  the  interface  to  the  user’s  microcode 
PREG  reads  the  microword  format  and  scans  the  translation  file  to  build  those  sections 
of  code  for  the  assembler  that  are  user  specific.  The  code  segments  generated  are  then 
written  to  files  that  are  added  into  the  assembler  when  it  is  compiled. 

PREG  first  scans  the  translation  file  for  the  microword  format  and  which  fields  may 
contain  labels  or  literals.  It  then  scans  the  translation  file  and  records  the  names  of  the 
fields  and  the  lengths  of  the  bit  translation  fields.  The  bit  lengths  are  needed  when  mak¬ 
ing  the  code  that  generates  the  reference  file  where  the  translated  microinstruction  fields 
are  separated  by  a  space  for  readability.  The  length  of  the  label  field  is  also  needed  so 
the  translation  from  the  line  number  to  the  binary  representation  of  the  proper  bit 
length  can  be  done. 

After  having  scanned  the  translation  file,  PREG  builds  the  user-specific  code  The 
first  file  created,  assem.h,  is  a  header  file  containing  definitions  needed  to  implement  the 
code.  These  definitions  include  the  word  length  in  bits,  the  label  field  length  in  bits, 
structures  for  the  fields  found  in  the  translation  file,  and  defines  integers  to  record  the 
number  of  subfields  for  each  field  definition.  Because  the  assembler  dynamically  assigns 
these  value  when  it  reads  the  translation  file,  the  number  of  subfields  for  each  field  may 
change  without  needing  to  rerun  PREG. 

The  next  file  created  by  PREG  is  the  assem. tailored  file  This  contains  the  routines 
to  read  the  translations  file,  translate  the  microcode,  make  the  reference  file,  and  preform 
the  reverse  assembly.  The  routine  to  read  the  translation  file  reads  the  fields  in  the  order 
found  when  PREG  scanned  the  translation  file  The  routine  to  translate  the  microcode 
uses  the  microword  format  and  the  names  included  in  the  translation  file  The  reference 


file  routine  uses  the  bit  lengths  to  separate  the  fields  when  the  reference  file  is  created 


The  reverse  assembly  routine  parses  the  translated  file  into  fields  and  then  translates  the 
fields  backwards  into  mnemonics,  except  for  the  jump  field  which  is  replaced  with  a  line 
number. 

To  make  a  tailored  assembler,  the  user  runs  gmat  a  shell  script  that  runs  PREG 
and  compiles  the  resultant  assembler.  Appendix  B  shows  the  gmat  shell  script.  Appendix 
C  shows  the  code  for  PREG.  Appendix  D  shows  the  two  files  created  when  GMAT  was 
run  for  the  PFA  controller.  Appendix  E  shows  the  code  for  the  assembler  skeleton, 
translation  file  for  the  PFA  controller  is  shown  in  Appendix  F. 
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CHAPTER  5 


VLSI  Design 


5.1.  Design  Techniques 

Before  describing  the  the  VLSI  implementation,  it  is  necessary  to  explain  several 
different  V  LSI  design  techniques  used.  These  are  2-phase  clocking,  design  with  transmis¬ 
sion  gates  (t-gates),  and  master-slave  flip-flops  (MSFFs). 

Two-phase  clocking  employs  the  use  of  two  non-overlapping  clocks  to  synchronize 
operations  on  the  chip.  [Wes85  .  A  timing  diagram  is  shown  in  Fig.  10.  The  two  clocks, 
PQl  and  PQ1,  each  have  separate  operations.  The  inputs  to  all  logic  units  that  are  syn¬ 
chronized  with  the  clocks  become  valid  on  the  rising  edge  of  PQl,  falling  edge  of 

precharge  if  precharging  is  used,  starting  all  computations.  The  outputs  are  latched  on 
the  falling  edge  of  PQ2,  ending  all  computations.  Two-phase  clocking  is  useful  for  syn¬ 
chronization  in  the  circuit  It  also  prevents  some  signals  from  racing  through  flip-flops 


destroying  the  intended  sequencing. 


PRECHARGE 


Useful  Computation  Time 


Figure  10  2-Phase  Clocking  Timing  Diagram 


The  second  design  style  is  the  use  of  transmission-gates  (t-gates).  T-gates  are  the 
CMOS  switch,  the  equivalent  to  pass  transistor  in  nMOS  logic.  A  t-gate  is  shown  in 
Fig.  11(a).  The  transmission  gate  is  made  up  of  a  p-transistor  and  an  n-transistor  in 
parallel.  Both  types  of  transistors  are  because  a  p-transistor  will  not  pass  a  strong  logic- 
0,  that  is,  it  passes  no  lower  then  approximately  1.7v,  and  a  n-transistor  will  not  pass  a 
strong  logic-1,  that  is,  it  passes  no  higher  than  approximately  3.3v.  By  using  both,  a 
good  switch  with  both  a  strong  logic-1  and  logic-0  is  designed.  The  control  for  the  t-gate 
is  supplied  to  the  n-transistor  and  its  complement  is  supplied  to  the  p-transistor  so  that 
both  transistors  are  on  when  control  is  high  on  the  n-transistor  and  its  complement  is 
low  on  the  p-transistor.  When  using  the  t-gate  symbol,  only  the  signal  to  the  n-transistor 
is  shown  in  Fig.  11(b).  The  complemented  input  to  the  p-transistor  is  still  needed,  but 
not  shown. 

A  MSFF  is  shown  in  Fig.  12.  The  MSFF  is  the  basic  storage  unit.  The  input  is 
latched  on  the  falling  edge  of  PQ2  and  remains  in  the  first  feedback  loop  until  the  rising 
edge  of  PQ1,  at  which  time  it  moves  into  the  storage  area  on  the  right  where  it  is  latched 
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Figure  11  Transmission  gate 


on  the  falling  edge  of  PQl.  The  feedback  loops  keep  the  value  of  the  nodes  as  long  as 
power  is  supplied  to  the  circuit  or  until  a  new  value  is  loaded.  When  PQl  rises,  the  value 
is  output  to  the  circuitry  using  the  stored  signal. 

5.2.  Memory  Chip 

The  memory  chip  includes  the  storage  cells  for  the  bits,  the  encoding  circuitry  for 
the  error  correcting,  the  decoding  circuitry  for  the  error  correcting,  as  well  as  other  cir¬ 
cuits  for  bitline  control,  word  selection,  bitline  detection,  and  one-shot  generation.  Fig¬ 
ure  13  shows  the  chip  architecture  including  all  the  major  components. 

6.2.1.  Memory  Cell.  The  memory  cell  is  based  on  a  one  transistor  cell  design 
shown  in  Fig  14(a).  The  one-transistor  design  was  chosen  to  increase  the  density  of  the 
memory.  The  cifplot  of  the  memory  cell  is  shown  in  Fig.  14(b).  Each  memory  cell  actu¬ 
ally  holds  two  bits,  one  associated  with  the  wordline  above  and  the  other  with  the  word¬ 
line  below.  This  was  done  for  several  reasons  including  modularity,  density,  and  capaci¬ 
tance  reduction  Modularity  is  obtained  by  designing  the  cell  to  be  arrayed  in  any  direc¬ 
tion.  The  density  is  obtained  because  the  cell  is  so  tightly  packed.  The  final  reason,  and 
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Figure  14  One-Transistor  Memory  Cell  Logic  and  Cifplot 


the  most  important,  is  the  reduction  of  the  bitline  capacitance.  The  sources  of  all  the 
memory  storage  cell  transistors  are  attached  to  the  bitline.  If  transistor  source  sharing 
was  not  used,  there  would  n  sources  on  the  bitline  where  n  is  the  number  of  words  in  the 

array.  In  this  implementation,  there  are  only  ~  sources.  The  reason  for  keeping  the 

capacitance  of  the  bitline  as  small  as  possible  is  related  to  reading  a  memory  value. 
Reading  is  based  on  charge  sharing  between  the  bitline  and  the  storage  node.  The  word- 
lines  run  horizontally  across  the  cell  in  polysilicon  and  second  metal.  The  polysilicon  and 
the  second  metal  are  shunted  together  at  both  sides  of  the  array  to  decrease  the  effective 
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resistance  of  the  polysilicon  thereby  decreasing  the  access  time  The  bitlines  run  vertically 
in  first  metal.  The  actual  memory  storage  area  is  n  diffusion  area  connected  to  the  dram 
of  the  pass  transistor  on  the  bitline.  A  grounded  polysilicon  plate  sits  above  the  area  of 
n-diffusion.  The  charge  for  the  cell  is  stored  on  the  capacitor  created  between  the  polysili¬ 
con  and  the  n-diffusion.  When  the  wordline  is  raised,  the  precharged  bitline  will  either 
maintain  its  value  because  the  charge  stored  on  the  capacitor  was  high  and  no  charge 
sharing  takes  place,  or  experience  a  decrease  in  charge  because  the  value  on  the  polysili¬ 
con  plate  was  low  and  charge  sharing  between  the  bitline  and  the  cell  takes  place  The 
decrease  in  charge  will  be  less  than  0.02  volts  because  the  capacitance  on  the  bitline  is  so 
much  greater  then  the  capacitance  of  the  memory  cell. 


6.2.2.  Bitline  Control.  The  bit  line  is  pulled  up  and  down  for  reading  and 
writing.  Figure  15  shows  the  circuitry  for  the  bitline  control.  The  bit  line  is  precharged 
before  every  read  and  before  writing  a  logic-1,  and  pulled  down  before  and  during  writ¬ 
ing  a  logic-0  The  three  signals  that  determine  the  state  of  the  b  it  1  in  e  are 
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Figure  15  Bitline  Control  Circuitry 
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Write_enable(Left  from  PFA),  precharge  (PQl  from  WFT),  and  bit  (data  to  be  written 


from  WFT).  These  three  signals  control  the  bitl\ne_precharge  (active  low)  and 
bttline_pulldown.  The  above  is  summarized  below: 

Table  2:  Bitline  Control  Logic 
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6.2.3.  Sense  Amplifiers.  The  sense  amplifier  design  is  shown  in  Fig  16  The 


sense  amplifier  is  used  to  detect  a  slight  drop  in  the  bit  line  voltage  if  the  memory  cell 
stored  a  logic-0  The  sense  amplifier  is  based  on  a  differential  voltage  amplifier.  A  dummy 
voltage  reference  is  used  for  the  comparison  The  dummy  bitline  is  connected  to  a 


column  of  memory  cells  that  always  have  a  stored  value  of  2.5  volts  from  a  voltage 
divider.  Vref  is  used  to  maintain  a  constant  current  in  the  amplifier.  When  a  logic-1  is 
read  and  the  bitline  voltage  does  not  change,  it  will  be  higher  than  the  dummy  bitline. 
This  will  cause  its  transistor  to  be  fully  on,  causing  the  above  node’s  voltage  to  drop. 
The  p-transistor  above  the  dummy  is  turned  on,  pulling  the  output  toward  logic-1. 
When  the  bitime  is  lower  than  the  dummy,  more  current  will  be  drawn  through  the 
dummy  transistor  pulling  the  output  to  logic-0. 

5.2.4.  Word  Selection.  The  word  selector  is  based  on  a  NOR  of  the  address 
bits.  The  design  is  shown  in  Fig  17.  The  NOR  approach  was  chosen  over  the  NAND 
approach  because  the  resistance  of  the  consecutive  gates  in  the  NAND  design  increase  the 
time  needed  for  the  output  to  change  The  inputs  are  selected  for  the  NOR  as  either 
address  or  address  in  such  a  way  as  to  prevent  pulldown  when  the  word  is  selected  The 
NOR  had  previously  been  pulled  up  by  precharge_shot  and  the  address  select  lines  are 
gated  with  precharge_shot  to  prevent  inadvertent  pulldown  of  a  NOR  output  while  it  is 
being  pulled  up  If  the  NOR  output  is  logic- 1  when  precharge  transitions  to  logic- 1  the 
value  of  the  NOR  passes  through  the  NAND  gate  This  output  is  inverted  o  become  the 
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6. 2. 0.1.  Encoding.  Ih.-  ett.  .  der  .  ir.  u  it  ,  input.-*.  t  h<  eigh’  {  .» r  1 1  \  In-  » d  1 .  ■  t 
to  the  word  for  error  correction  and  detection  I  a<  h  of  the  eight  pants  hn-  is  dense.) 
from  a  logic  function  based  on  six  different  bits  of  tt  input  word  I  a<  h  paritt  t  it  is  a 
unique  combination  of  six  input  b>its  The  equations  were  described  m  ('hapt.-r  .)  In 
boolean  logic  the  addition  operation  is  the  same  as  the  XOR  logic  function  Thus  each 
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Figure  19  Syndrome  Bit  Generator 

The  compulation  of  the  error  vector  bits  may  be  done  two  different  ways,  the  eight 
syndrome  bits  could  be  fed  into  a  Programmed  Logic  Array  (PLA)  or  each  error  vector 
bn  .  •  .ii l< i  be  computed  using  custom  logic  The  PLA  method  is  not  the  most  efficient 
because  each  error  vector  bit  is  the  sum  of  only  one  product  This  would  leave  a  great 
deal  of  area  not  being  utilized  Custom  logic  is  thus  the  implementation  choice  To 
increase  the  ease  of  design,  a  basic  cell  is  used  and  then  personalized  with  smaller  cells  to 
select  the  desired  logic  configuration  The  logic  for  the  error  bit  generator  is  similar  to 
the  address  decoder  except  that  no  gating  is  done  with  precharge  Finally,  the  error  vec¬ 
tor  bits  are  fed  into  a  2-mput  XOR  gate  to  produce  the  final  output  as  shown  in  Fig  20 
The  other  inputs  are  the  bits  read  out  of  the  memory  The  ECC  is  designed  to  produce 
error  control  for  all  32  bits  (2-4  data,  8  parity)  but  the  8  parity  bits  are  not  used  outside 


of  the  chip,  so  there  is  no  need  to  correct  them  This  reduces  the  number  of  XOR  gates 
to  24 

6. 2. 0.3.  ECCC  and  ECCU.  The  Error  Correcting  Code  Corrected  signal  sent 
to  the  PFA  is  determined  based  on  the  syndrome  bits  If  the  number  of  syndrome  bits  at 
logic-1  is  one  or  two,  then  a  single  error  was  found  and  can  be  corrected  Thus  the 
ECCC  signal  can  be  implemented  using  a  PLA  with  24  product  terms  of  8  bits  each 
The  Error  Correcting  Code  Cncorrected  signal  sent  to  the  PFA  is  also  based  on  the  syn¬ 
drome  bits  If  the  number  of  error  vectors  bit  at  logic-1  is  greater  than  two.  then  a  dou¬ 
ble  error  was  detected  and  may  be  corrected  The  correction  cannot  be  guaranteed  how¬ 
ever  Thus  the  ECCC  signal  could  be  implemented  using  a  PLA  with  2’  —21  =  104  pro¬ 
duct  terms  of  eight  bits  each  A  PLA  with  104  product  terms  is  prohibitively  large  for 
anv  application  The  solution  for  these  signals  ran  be  implemented  using  analog  circuit 
techniques  very  easily  The  circuit  is  shown  in  Fig  21  When  one  or  two  of  the  error  vec¬ 
tor  bits  ls  high,  the  input  to  the  inverters  will  drop  no  lower  than  2  3  volts  This  will 
trigger  the  top  inverter,  designed  to  switch  at  2  5  volts  When  more  than  two  of  the  error 
vector  bits  is  raised,  the  inputs  to  the  inverters  will  drop  below  2  volts  The  bottom 
inverter  is  designed  to  trigger  below  at  2  volts  The  logic  is  summarized  as  follows 
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Figure  21  FCf'C  FT'f't  Circuitry 

Table  -4  FX’CC  and  K('('l  Determinal r  n 

A  H _ Result 

0 _ _0 _ no  errors 

0  1  not  possible 

J _ 0  FC  ’(  ( 

I  I  KCCl 


6.2.7.  Switching  Circuitry.  The  switching  circuitry  is  used  t<>  control  the 
flow  of  the  data  t"  the  memory  arrays  The  data  flow  is  shown  in  fig  22  I  he  word 
select  lines  must  come  from  two  different  sources,  one  source  for  reading  and  the  other 
for  writing  The  input  data  is  channeled  into  the  memory  side  being  written  and  the  out¬ 
put  data  is  extracted  from  the  side  of  the  memory  being  read  from  The  signal  to  accom¬ 
plish  all  of  the  necessary  multiplexing  is  the  Wr\te_rnabir  signal  generated  by  the  PFA 
controller  as  Lh.t  T 
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6.8.  PFA  Chip 


The  PFA  controller  has  three  major  sections,  the  control  section,  the  data  section, 
and  the  periphery  These  are  shown  in  Fig  23 

6.8.1.  Control  Side.  The  control  side  of  the  PFA  controller  is  responsible  for 
generating  the  control  signals  used  to  operate  the  PFA  controller  and  the  pipeline  It 
does  this  by  sequencing  through  a  set  of  microinstructions  stored  in  a  read  only  memory 
The  address  of  the  instruction  to  be  executed  is  generated  by  the  control  sequencer  based 
on  various  control  signals  from  the  current  instruction  and  certain  state  variables  gen¬ 
erated  by  the  environment  The  control  section  was  described  in  Chapter  3  and  shown 
again  in  I  ig  -4  The  major  sections  include  the  control  memory,  the  nest  address  genera¬ 
tor  the  condition  select  the  stack  and  the  incrementer 
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5.3.2.  XROM.  The  control  memory  is  implemented  using  a  read-only 
memory  (ROM)  developed  at  AFIT  This  ROM,  the  AF1T  XROM  was  designed  by  Paul 
Rossbach  in  1985  [Ros85j  and  its  general  structure  is  shown  in  Fig  24  The  vertical 
pitch  of  the  memory  cells  is  such  that  decoders  are  needed  on  both  sides  of  the  ROM  to 
access  all  of  the  words  The  wordlines  run  horizontally  in  polysilicon  and  second  metal 
The  horizontal  pitch  of  a  memory  cell  is  so  small  compared  to  the  sense  amplifier  that 
some  column  decoding  is  needed  The  sense  amplifier  is  four  times  the  horizontal  pitch  to 
run  a  single  bit  line  Therefore,  two  of  the  address  lines  are  fed  into  the  sense  amplifiers 
to  select  one  of  four  bit  lines  Additionally,  the  LSB  of  the  address  lines  is  fed  into  the 
column  drivers  at  the  bottom  to  select  which  bit  is  "AO  " 


6.S.2.I.  XROM  Memory  Cell.  The  AFIT  XROM  memory  cell  is  shown  in 
Fig  25  The  name  XROM  is  derived  from  the  X'  shape  of  the  transistors  around  a  com¬ 
mon  drain  Ros85  Before  a  read  each  of  the  bitlines  is  precharged  to  logic- 1  through  an 
n-transistor  resulting  in  a  voltage  around  3  3  volts  When  a  wordline  is  selected  the 


transistors  connecting  the  bitline  with  the  ,4  0  or  AO  address  line  turn  on  allowing  the 
bitline  to  discharge  through  whichever  of  the  two  is  tied  to  logic-0.  When  the  bitline  is 
discharged  to  logic-0  this  indicates  the  presence  of  a  transistor  and  thus,  a  stored  value 
of  logic-1.  If  no  transistor  is  present,  the  bitline  will  not  discharge  and  the  sense  amplifier 
recognizes  this  as  a  stored  logic-0.  If  both  ,4  0  and  ,4  0  are  connected  to  the  bitline, 
"fighting"  will  occur  and  the  bitline  voltage  will  settle  to  1.5  volts.  Since  the  sense 
amplifier  is  set  to  trigger  above  the  n-transistor  at  4  volts,  it  will  correctly  recognize  this 
as  a  logic-0. 

6.3.3.  XROM  Sense  Amplifier.  The  sense  amplifier  is  used  to  detect  the 
value  on  the  bitline  and  amplify  it  to  a  full  5  volts  or  0  volts.  The  implementation  is 
shown  in  Fig.  26.  Initially,  the  bitline  is  precharged  through  the  p-transistor  at  the  top  of 
the  cell.  The  two  address  lines  which  select  one  of  the  four  bitlines  are  already  stable. 
Thus  at  the  end  of  precharge,  the  bitline  below  the  selected  n-transistor  is  at  3.3  volts.  If 
a  transistor  is  present  at  the  word  selected,  the  bitline  will  discharge  to  approximately  0 
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Figure  26  XROM  Sense  Amplifier 


volts,  this  causes  the  gate  connected  above  the  bitlines  to  turn  the  p-transistor  on,  rais¬ 
ing  the  output  to  logic-1.  The  two  transistors  before  the  two  inverters  are  designed  so 
that  they  become  an  inverter  triggered  a  4  volts.  The  n-transistor  gate  is  not  connected 
to  the  p-transistor  as  in  regular  inverters  to  minimize  the  capacitance  on  the  sense  line. 
When  no  transistor  is  present,  the  bitline  does  not  discharge,  the  n-transistor  pulls  the 
output  of  the  pseudo-inverter  down,  and  a  logic-0  is  output.  If  two  transistors  were  on 
the  bitline,  it  will  settle  at  2.5  volts  below  the  n-transistor.  This  will  drop  the  sense  line 
enough  to  turn  the  p-transistor  on  the  pseudo- inverter  output  a  logic-1  as  expected. 


5.3.4.  XROM  Pipeline  Register.  The  XROM  pipeline  register  sits  above  the 
XROM  sense  amplifiers  and  was  developed  by  Capt.  David  Gallagher  [Gal87  .  Using  a 
pipeline  increases  the  utilization  of  the  XROM.  When  a  pipeline  is  used,  the  last  word 
out  of  the  XROM  is  being  executed  while  the  next  word  is  being  fetched.  By  overlapping 
the  execution  and  fetch  operations,  the  effective  speed  of  the  controller  is  doubled. 
Without  a  pipeline,  it  would  take  one  complete  clock  cycle  to  fetch  the  word  and  another 
to  execute,  then  the  next  word  would  be  fetched  in  one  clock  cycle  and  executed  in 
another  and  so  on  With  a  pipeline,  however,  the  clock  cycle  needed  for  fetching  is 
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hidden  by  the  execution  of  the  last  word  fetched. 

The  pipeline  works  by  using  a  modified  MSFF  to  store  the  instruction  from  the 
XROM  output  and  isolate  it  while  the  XROM  fetches  the  next  word  Additionally,  the 
pipeline  may  be  logically  separated  from  the  XROM  for  testing  purposes.  In  this  mode, 
the  pipeline  becomes  a  shift  register  controllable  from  several  chip  pads. 

6.3.5.  Control  Sequencer.  The  control  sequencer,  designed  by  Larry  French 
[Fre86;  and  modified  by  David  Gallagher  |Gal87  .  determines  the  addressing  for  the 
XROM.  It  consists  of  five  main  blocks,  address  selection,  condition  select,  branch  select, 
incrementer,  and  stack. 

5. 3. 6.1.  Address  Selection.  The  address  select  block  selects  the  next  address 
for  the  XROM  from  four  sources.  The  first  is  the  next  address  field  of  the  current 
instruction.  Allowing  the  next  address  to  come  from  an  instruction  enables  the  microcode 
to  branch  The  second  source  is  the  top  of  the  stack.  Using  a  stack  allows  the  use  of 
subroutines  in  the  microcode.  The  stack  will  store  the  address  of  the  instruction  follow¬ 
ing  the  call  so  that  program  control  can  return  to  that  point  The  third  source  is  the 
incrementer  allowing  sequential  addressing.  The  fourth  source  is  an  external  source  This 
external  source  is  used  to  manually  control  the  sequencer  for  testing  of  the  controller  In 
a  more  general  processor,  this  source  can  be  used  to  map  functions  from  a  register 

6. 3. 5.2.  Condition  Select.  The  condition  select  block  is  used  to  select  one  of 
thirty-two  possible  conditions.  These  conditions  are  used  to  determine  conditional 
branches,  calls,  and  returns.  The  original  32:1  mux  routing  was  slightly  modified  for  this 
thesis  effort  to  allow  access  to  all  32  condition  inputs  French  gives  a  detailed  discussion 
on  the  construction  of  the  32  1  mux  in  his  thesis  Fre86  The  condition  selected  depends 


on  the  value  of  the  5  conditional  mux  select  (CMS)  lines  generated  from  the  XROM.  The 
conditional  input,  along  with  the  values  needed  to  select  it  appear  below. 

Table  5:  Condition  Selects 


Tii>  r« n~  t  •  r  sr  |r  i  ,|,n  i  tn-v,  p.trt  i  ul.«r  -  -  n  lit.  .n*  will  !»*•  «  .  »•  ri  t  h»*  mi-  r.  «-•  «d< 

•  \  <-i  ;  m-  nt  i-  .ii--  uvx-l 


6. 3. 6-3.  Incrementer  I  h*  m  r*Tti*-nt*T  -  .in  h*  itu ;  Nttit-n  t*\l  h\  pl.t.mc  a  ful 


t  1  !•”  t-  >• » 


h  bn  Mt  t  mi  1  t  h>-  inputs  t-  /*-r  ami  th>-  -t  b.-r  t.  •  th>-  f  >  r  ••  v  i-  -u  >  address 


-itiin  tii-  .  art  v  ii  ■'  *  ti  *  i  Mgmb  u  -  b.t  t  1  and  l*-tutig  th-  ,-arrv  propagate 
thr-ugh  all  7  t-  tv  vu,  .  •  v,-  •  [.  >  ,r;  r  r  t  h  ••  «d  1«t«<  iv  ai»  /»-r  th*-  equations  can 

(•**  uni  pi ih **d  1  -  t! 


Sum  =  A  xor  B  xor  Cin  and 


Cout  =  A  (A  xnoT  B)  +  Cin  (A  xor  B ) 
to 

Sum  =  A  xor  Cin  and 
Cout  =  A  and  Cin 

Additionally,  since  Cin  to  the  LSB  is  always  1,  the  equations  for  this  bit  simplify  to 
Sum  —  .4  and  Cout  =  A  Thus,  all  bits  but  the  LSB  can  be  implemented  with  half¬ 
adders  as  shown  in  Fig  27.  The  LSB  can  be  implemented  simply  using  a  single  inverter 

6.3.8.  Subroutine  Stack.  The  stack  is  used  to  store  the  return  address  for  a 
call  On  a  cell,  the  address  of  the  next  instruction  is  pushed  on  the  top  of  the  stack  On 
a  return,  the  top  of  the  stack  is  poped,  and  this  becomes  the  address  of  the  next  instruc¬ 
tion  For  the  PFA  controller  microcode,  there  will  only  be  one  call  active  at  a  time  mean¬ 
ing  that  the  stack  only  needs  to  store  one  address.  Figure  28  shows  the  stack  for  more 
than  one  stage  to  illustrate  the  stack  operation  When  Push  is  activated,  the  input  is  fed 
i n t o>  the  first  MSF  F  and  the  values  of  the  following  MSFFs  are  fed  into  the  next  M'TT 
The  2- phase  clocking  keeps  the  values  from  overwriting  each  other  until  the  proper  tim* 
When  the  Pop  is  activated,  the  output  comes  from  the  top  of  the  stack  and  all  other 


Figure  27  Incrementer  Half-Adder 
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The  conditional  functions  are  executed  when  the  selected  condition  is  activated 


condition  mux  field  described  earlier 


6.8.7.  Data  Side.  The  data  side  if  the  PFA  controller  includes  the  registers, 
the  arithmetic  logic  unit,  and  any  special  configurations  for  data  handling  including  spe- 
<  ial  register  interfaces  and  data  data  path  insertion. 

6. 8. 7. 1.  Register  Celia.  The  data  in  the  PFA  controller  is  stored  mainly  in 
o-KiMer-  This  itives  a  uniform  method  of  access  and  increases  modularity  and  regularity 
;i.  Mi'  lavout  1  he  basil  register  cell  is  shown  in  Fig  29(a).  The  cell  is  based  on  the 
Vvl  1  desi  nbed  earlier  The  input  is  loaded  into  the  first  part  of  the  cell  when  Load  is 
tig!  and  the  rising  edge  of  PQ2  occurs  The  data  is  latched  on  the  falling  edge  of  PQ2. 
1  h  M  '  'iMhg  edge  of  pQj  the  data  is  loaded  into  the  main  part  of  the  cell.  When  the 
-  >•  he  driven  .,n  the  A  or  B  bus.  the  signal  lets  the  value  pass  through  the  t-gate 
’  ’•  i he  tu>  The  three  inverters  from  the  PQl  t-gate  to  the  bus  are  staged  to 
•  maximum  'urreni  drive  to  the  bus  line  Two  other  register  cells  are  needed  by 
IM  A  m-i.i  roller  The  first  allows  the  cell  to  be  driven  from  external  sources  other 
■hi:  'h>  hi-  and  i  he  second  allows  the  cell  to  drive  its  value  to  a  destination  other  than 
'!•  '  >i-  I  h ♦  externallv  loadable  cell  ls  shown  if  Fig  29(b)  In  the  cell,  the  load  t-gate  is 
i  .  (  assc.]  and  t lie  input  ls  driven  in  when  the  other  load  signal  is  raised.  The  driveable 
•  sh  w  n  is  fig  29i  <  |  In  this  cell  the  value  ls  tapped  off  right  before  the  t-gate 
i!  crt.  i-  ,i,i-  a  t  gat*  followed  bv  an  inverter  In  all  four  cells  are  needed  to  implement 
ah  the  registers  a  ba.su  cell  a  loadable  cell  a  driveable  cell,  and  a  loadable  and  driveable 
tell  The  loadable  and  driveable  cell  is  just  the  extra  cell  circuitry  for  both  options  added 
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The  A  and  B  busses  are  precharged  for  two  reasons.  First  the  time  to  pullup  the 
line  can' be  incorporated  into  areas  of  the  clocking  where  no  useful  computation  is  taking 
place  (i.e. ,  register  selection),  and  second  a  decrease  in  the  register  cell  size.  The  standard 
inverter  contains  a  p  device  for  pulling  up  to  logic-1  and  an  n  device  for  pulling  down  to 
logic-0.  In  using  a  precharged  bus,  if  the  inverter  output  is  a  logic-1  no  action  is  taken, 
however  if  the  output  is  a  logic-0,  then  the  bus  line  is  pulled  to  logic-0.  Therefore  by 
using  precharged  buses,  the  p-device  pullup  is  not  needed.  This  significantly  reduces  the 
area  needed  for  the  register  cell.  This  size  directly  impacts  the  register  arrays.  If  a  cell  is 
decreased  by  one  lambda  in  the  vertical  direction,  this  equates  to  decrease  of  28  lambda 
for  the  entire  array,  one  lamba  per  register.  The  removal  of  the  p-device  reduced  the  size 
of  the  register  cell  by  approximately  12  lambda  in  the  vertical  direction. 

5. 3. 7.2.  Registers.  This  section  describes  all  the  registers  and  how  the  data  is 
mapped  in  them.  The  previous  section  described  the  three  types  of  registers  cells  that  are 
used.  Each  register  is  an  array  of  the  type  of  cell  needed  according  to  the  register’s  func¬ 
tion  as  described  in  chapter  3,  Section  3.2.5.  The  register  cell  is  an  array  of  16  basic 
register  cells.  This  cell  is  used  to  implement  the  error  count  registers  (WD,  EC,  EU,  and 
PE),  the  TOUT  register  (Timeout),  and  the  TEMP  register.  The  sregister  cell  is  an  array 
of  16  loadable/driveable  register  cells  used  to  implement  the  TSR  and  PSR  registers.  The 
dregister  is  a  register  using  16  of  the  driveable  register  cells  and  is  used  to  implement  the 
CCR  register.  Finally,  the  ELR  is  made  up  of  16  loadable  register  cells.  The  interfaces 
with  the  registers  is  described  in  the  following  section. 

The  least  complicated  registers  are  the  error  count  registers.  These  registers,  includ¬ 
ing  the  9  watchdog  counters  (WD11-WD33),  the  3  parity  error  counters  (PE1-PE3),  the  3 
error  corrected  counters  (EC1-EC3),  and  the  3  error  uncorrectable  counters  (EU1-EU3), 
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store  a  16-bit  count  indicating  how  many  errors  have  occurred  for  the  event  the  register 
counts.  The  WD  registers  are  used  not  only  for  fault  monitoring,  but  to  determine  which 
WD  will  become  the  next  active  processor  should  that  become  necessary. 

The  CCR  and  the  NCR  store  the  current  configuration  and  the  next  configuration 
respectively.  Each  of  the  nine  MSBs  are  associated  with  a  WFT  processor  and  indicate, 
by  a  logic-1,  if  the  processor  is  active.  Accordingly,  a  logic-0  indicates  that  the  processor 
is  in  watchdog  mode.  The  bit  to  processor  translation  is  shown  in  Table  7. 

Table  7:  Bit  to  Processor  Translation 

ProMMor  WDll  WD12  WD1S  WD21  WD22  WD23  WDS1  WD32  WDM  not  defined 


Bit  16  14  13  12  11  10 


S  7  9  6  4  3  2  10 


The  ELR,  Error  Location  Register,  indicates  where  errors,  if  any,  have  occurred  dur¬ 
ing  the  DFT  computation  just  completed.  The  first  nine  bits  are  similar  to  those  for  the 
CCR  and  NCR,  but  the  following  six  bits  are  associated  with  parity  errors  and  memory 
uncorrected  errors.  There  is  one  bit  position  for  each  of  the  three  columns  for 
Parity  Error  and  ECCU.  The  translation  is  shown  in  Table  8. 

Table  8:  ELR  Translation 


WD  WD  WD  WD  WD  WD  WD  WD  WD  PE  PE  PE  EU  EU  EU 

Error  nd 

11  12  13  21  22  23  31  32  33  1  2  3  1  2  3 

Bit  15  14  13  12  11  10  9  8  7  6  5  4  3  2  10 


The  TSR  and  the  PSR  store  the  temporary  and  permanent  scale  factors,  respec¬ 
tively.  The  TSR  is  used  to  store  the  scale  factors  to  be  driven  to  the  WTT  processors 
and  the  receive  the  output  scale  factors  from  the  processors  after  computation.  Once  the 
scale  factors  are  received  after  a  computation,  their  values  are  added  to  the  PSR  for  scale 
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accumulation  The  TSR  is  then  shifted  so  that  the  scale  factors  can  be  passed  to  the  next 
stage.  Each  problem  uses  five  bits  to  store  the  accumulated  scale  factor  Five  bit  are  at 
worst  case  when  all  the  scale  factors  are  7  (i.e  ,  111  +  111  +  111  +111  =  11100)  Once 
the  contents  of  the  TSR  are  added  to  the  PSR,  it  too  is  shifted  so  that  the  least 
significant  set  of  five  store  the  total  scaling  of  the  next  DFT  to  complete.  The  bit  transla¬ 
tion  is  shown  Table  9. 

Table  9:  TSR  and  PSR  Translation 


Register 


Problem  1 


Scale  Factors  (MSB-LSB) 
|  Problem  2 


Problem  3 


TSR 

0 

0 

s2 

sl 

90 

0 

0 

92 

sl 

90 
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sl 

90 
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PSR 

$4 

s3 
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si 

90 

s4 

S3 
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sl 

90 
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s3 
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sl 

90 
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Bit 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 
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The  Problem  Status  registers  (PSl,  PS2,  PS3)  are  used  to  store  information  about 
each  of  the  three  problems  in  the  pipeline.  The  five  MSBs  store  the  accumulated  scale 
factor  for  the  problem,  the  remaining  bits  are  used  for  error  identification.  In  this  way, 
the  host  can  determine,  in  the  event  of  a  failure,  which  problem  is  bad  and  where  the 
error  occurred.  The  LSB  indicates  whether  the  problem  finished.  This  identifies  which 
processor  column  did  not  finish  in  the  invent  a  timeout  failure  occurs.  The  information 
in  the  PSi  registers  duplicates  the  information  stored  in  other  registers,  but  consolidates 
it  by  problems  for  faster  identification  by  the  host.  The  bit  translation  is  shown  in  Table 
10. 

Table  10:  PSi  Translation 


Scale  Factors 


Active  Error  I  nd  I  Parity  Error 


Memory  Error  Done 


PSi  »4  a3  I  s2  I  sl  I  >0  |  Cl  |  C2  |  C3 


Cl  I  C2  I  C3  |  Ml  I  M2  |  M3  |  Dn 
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tinations,  or  a  combination  of  the  three  The  spin  ml  requirement*  :«r<  **imm  »ri/e.)  m 

Table  1 1 

Table  II  Spei  nil  Regl*tef  Heijii  ircllM'll  t  * 


RestMer 

Special  Rr<|Uirr  til  rill- 

TSR 

•Vbit  Vjkkmi inn  shiftable 

Dnvf  Stale  Scale  Hu* 

Load  from  State  Seal*  Hu* 

PSR 

•Vbit  S-po*ition  shiftable 

Drive  aet*  of  Vbit*  to  F’S|  RS;.*  }*S;t 

ELR 

Ld  9  MSB*  from  Error  \a*ignment 

PS. 

Load  5  MSB*  from  PSR 

Load  I.SB  from  Done  Input  Pail* 

CCR 

t)rive  9  MSB*  to  State  Scale  Bu* 

Fach  of  the  interfaces  is  shown  in  Tig  30  The  shift  is  implemented  as  follows  the 
shift  signal  ( S'h i ft PS  ft  for  the  PSR  and  ShiftTSft  for  the  TSR|  allows  the  tapped  value  to 
pass  through  the  t-gate  and  directly  into  the  load  by  pass  in  the  destination  cell  Tor  the 
load,  a  t-gate  is  put  before  the  load  by-pass  The  signal  on  the  input  line  is  fed  directly 
into  the  cell  when  the  t-gate  is  turned  on  (LdScale  for  TSH .  LdbLH  for  FLR,  LdScale  for 
PS»)  For  the  drive,  a  t-gate  is  attached  to  the  tap  point  before  the  drive  inverters  of  the 
register  cell  The  t-gate,  when  on,  allows  the  tapped  value  to  pass  through  and  onto  a 
staged  inverter  to  drive  the  line  The  t-gate  prevents  unnecessary  capacitance  when  the 
cell  is  not  driving  a  line.  For  the  TSR,  another  t-gate  must  be  placed  before  the  connec¬ 
tion  to  the  state/scale  bus  This  prevents  the  previous  scale  values  from  driving  the  bus 
while  the  state  values  are  being  driven  to  the  state/scale  bus  from  the  CCR  The 
state/scale  bus  and  its  interfaces  are  discussed  in  a  later  section  For  the  CCR.  this  is  not 
needed  because  of  the  extra  circuitry  needed  to  select  the  state  information 


The  extra  circuitry  is  needed  for  the  C(’R  when  il  drives  the  state  information  onto 
the  state  scale  hus  Thus  circuitry  selects  which  of  the  configuration  bit  is  driven  Since  it 
is  possible  to  load  the  state  of  three  WFT  processors  at  once  one  in  each  column,  the  cir¬ 
cuitry  selects  which  three  bits  to  drive  This  circuitry  is  shown  in  Fig  31  The  LRi 
1 1  -  2  3 1  selects  which  of  the  rows  to  drive  and  the  LdState  allows  the  information  to 
drive  the  state  scale  bus  only  when  loading  the  state  thus,  preventing  fighting  with  the 
scale  ui f .  >rmat  i<  >n 

6.S.7.4.  Register  Selection.  The  register  selection  cell  determines  whether  the 
register  is  selected  to  drive  the  A  bus.  drive  the  B  bus.  or  be  loaded  from  the  C  bus.  The 
cell  has  the  same  vertical  space  as  a  register  cell  This  allows  the  register  selects  to  array 
right  along  side  the  registers  Inside  the  cell  are  the  three  selection  circuits  The  selection 
circuit  is  based  on  ANDing  combinations  of  the  five  selection  bits  and  then  ANDing  this 
result  with  precharge  A  gate-level  description  is  shown  in  Fig  32. 
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Figure  31  Extra  Circuitry  for  Selecting  Configuration  Bits 


Figure  32  Gate-Level  Description  of  a  Register  Select 


The  inputs  for  each  select  (A  bus  drive,  B  bus  drive.  C  bus  load)  are  fed  into  a  fulh 
CMOS  NAND  gate  The  NAND  was  chosen  over  the  NOR  to  make  the  transition  delay 
more  equal  in  both  directions  for  the  least  amount  of  area  If  a  NOR  was  used,  the  delay 
for  a  0-1  transition  would  be  slower  than  the  delay  for  a  1-0  transition  because  of  the 
mobility  due  to  p-diffusion  since  the  ’1’  must  pass  through  5  p-transistors  This  could  be 
offset  by  increasing  the  length  of  the  gates  on  the  p-diffusion,  but  this  would  greatly 
increase  the  area.  Instead,  the  NAND  is  used  where  the  delay  for  a  T'  is  through  only 
one  p-transistor  The  difference  between  the  two  types  of  implementation  is  shown  in 
Fig.  33.  The  output  of  the  NAND  gate  is  inverted  to  form  the  AND  and  this  is  gated  to 
a  NAND  along  with  precharge  followed  by  an  inverter.  The  purpose  of  these  two  gates, 
making  an  AND,  is  to  prevent  the  selection  circuit  from  activating  the  selection  lines 
during  precharge.  If  either  of  the  drive  lines  were  to  be  activated  during  precharge ,  the  A 
or  B  bus  might  not  be  properly  precharged.  If  the  load  line  was  activated  during 
precharge ,  the  precharged  lines  would  be  loaded  into  the  register  destroying  all  previous 
data.  The  output  of  the  inverted  NAND  is  then  staged-up  to  drive  all  the  selection  lines 
for  the  register  cells. 

The  VLSI  implementation  of  the  cell  allows  them  to  be  stacked  vertically  for  modu¬ 
larity  in  the  design.  The  selection  bits,  along  with  their  complements,  run  vertically 
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Figure  33  NAND  and  NOR  \TSI  Gates 


through  the  array  of  register  selects  To  personalize  a  register  select,  the  select  signal  or 
its  complement  is  fed  into  the  NAND  gate  through  the  use  of  a  select  bit  cell  This  style 
of  implementation  allows  easy  change  from  one  personalization  to  another  so  that  only 
the  configuration  of  the  select  bit  cells  is  different  for  each  register  select  To  select  regis¬ 
ter  00111,  for  example,  the  personalization  would  be  (sel_0,sel_0,sel_l ,sel _ 1  .sell )  where 


sel_0  selects  signal  for  the  n-diffusion  ’and’  and  signal  for  the  p-diffusion  ’or’,  and  seLl 


selects  signal  for  the  n-diffusion  ’and’  and  the  signal  for  the  p-diffusion  ’or’.  A  cifplot  of  a 
register  select  cell  is  shown  in  Fig.  34  that  shows  the  personalization  for  00111. 

6.3.8.  Data  Path  Insertion.  The  XROM  must  be  able  to  insert  a  literal  onto 
the  C  bus.  This  allows  constants  to  be  loaded  from  the  XROM  microinstruction  to  the 
register.  This  is  accomplished  by  using  an  array  of  t-gates  controlled  by  LdCbus  which 
drives  the  XROM  field  onto  the  C  bus. 
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Figure  34  Register  Select  Cell  with  00111  Person alizat ion 


6.8.8.  Arithmetic/Logic  Unit.  The  Arithmetic  Logic  l  nit  (ALl'l  computes 
the  data  manipulations  for  the  controller  and  was  designed  bv  Capt  Dave  Gallagher 
[Gal87].  It  uses  four  functional  units  and  one  passive  unit  for  each  stage  The  functional 
units  include  logic  to  implement  addition,  AND,  OR  and  XOR,  the  passive  unit  com¬ 
putes  a  MOV'  by  passing  the  data  through  unchanged.  The  A  bus  feeds  directly  into  all 
five  units.  The  B  bus,  however,  is  fed  into  &  selection  unit  that  selects  either  B.  B.  0,  or  1 
depending  on  the  function  desired.  A  5:1  multiplexer  selects  which  unit  is  output  to  the 
C  bus.  Figure  35  shows  the  implementations  of  the  three  logic  functions.  Additionally, 
the  ALU  will  compute  four  flags:  overflow,  zero,  negative,  and  carryout.  The  PFA  con¬ 
troller  uses  only  the  zero  and  the  negative  flags. 

5.3.8.I.  Integer  Adder.  The  integer  adder  works  with  two’s  complement 
arithmetic  based  on  the  carry-select  method  [Wes85j.  With  the  carry-select  method,  the 
sum  and  carry  out  is  computed  for  both  a  carry-in  of  zero  and  a  carry-in  of  one.  When 
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tlie  earn  sum  mat  also  Lie  represented  as 


Figure  36  Carry  Select  Adder  Blocking 


Carry  =  (.4  mor  B)A  +  (.4  xor  B)C 

This  implementation  for  the  carry  allows  the  use  of  the  .4  xor  B  signal  generated  for  the 
sum  to  be  used  reducing  the  amount  of  circuitry  per  cell.  The  implementation  is  shown 
in  Fig  37 

5. 3. 0.2.  Functions.  The  functions  computed  by  the  ALU  as  well  as  the  sig¬ 
nals  needed  to  generate  are  described  as  follows: 


Figure  37  ALU  Adder  Cell 


•yy 

s  %  > 


I -*  *^1  4,< 


i  ».•  g.ij. 


I’aiv  >>.  t'li'li1!  IU  jtfcjlfcJVjLjU  ll.’M,  4>.  tU  .1 


Table  12:  ALU  Control  and  Functions 


Carry  In  Flags 


1 

all 

Cm 

all 

1 

all 

5.3.10.  Host  Control  Interface.  This  section  describes  the  host  interface  with 
the  PFA  controller  during  the  WA1TGO  loop.  The  WAITGO  loop  in  the  microcode  is 
used  so  the  host  can  examine  the  PFA  registers  and  change,  if  necessary.  For  the  host  to 
examine  any  register  it  must  be  able  to  select  a  register  to  drive  onto  the  data  bus.  For 
loading,  it  must  be  able  to  put  the  input  data  onto  the  data  bus  and  select  the  register 
to  load.  The  signal  HOSTCONTROL  determines  whether  the  inputs  to  the  register 
selects  come  from  the  XROM  or  from  the  host  and  the  signal  LOADSTUFF  determines 
the  data  flow  direction.  This  allows  the  same  set  of  host  register  selection  signals  for  both 
reading  and  writing.  It  also  allows  the  data  pads  to  be  used  for  reading  and  loading  of 
data.  Thus,  the  number  of  pads  is  21  (16  data,  5  register  selection)  instead  of  42.  Figure 
38  shows  how  the  determination  is  made  for  the  register  selection.  The  host  register 
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Figure  38  Source  Determination  for  Register  Selection 


selection  source  for  the  A  bus  will  be  the  XROM  for  HOSTCONTROL+LOADSTUFF 
and  the  source  for  the  C  bus  selection  will  be  the  XROM  for 
HOSTCOXTROL+LOADSTUFF  otherwise,  it  will  be  the  host.  This  ensures  that  when 
the  host  is  in  control,  but  not  using  one  the  register  selects,  the  inputs  will  be  set  to  all 
zeros  from  the  XROM  preventing  nondeterministic  results  from  floating  lines.  Figure  39 
shows  the  source  and  destination  determination  for  the  data  busses.  For  the  data  bus, 
data  will  flow  from  the  pads  to  the  C  bus  for  HOSTCONTROL  LOADSTUFF  and  from 


A  Bus  C  Bus 


Figure  39  Source  and  Destination  Determination  for  Data  Busses 


the  A  bus  to  the  pads  for  HOSTCONTROL LOADSTUFF.  Using  the  LOAD  STUFF  sig¬ 
nal  prevents  the  A  bus  and  the  C  bus  from  being  shorted  together.  The  signals  and  their 
effects  are  summarized  in  the  following: 


Table  13:  Register  Selection  and  Bus  Determination 


HOSTCONTROL 

LOADSTUFF 

(H 

A  bus 
destination 

C  bus 
select 

C  bus 

source 

0 

0 

XROM 

internal 

internal 

0 

1 

HOST 

intern  al 

XROM 

internal 

1 

0 

XROM 

H335EH 

XROM 

nsm 

1 

1 

XROM 

internal 

HOST 

PADS 

5.3.11.  Periphery.  The  peripheral  circuitry  contains  that  which  does  not  fit 
into  either  of  the  two  previous  major  sections.  This  circuitry  includes  the  interfaces  to 
the  WTTs,  the  state/scale  bus,  the  voting  circuitry,  the  load  circuitry,  the  scale  factor 
handling,  the  DFT  size  handling,  the  4  Done  signal  generation,  the  ERROR  signal  gen¬ 
eration  and  associted  signals,  and  the  toggle  flip-flop. 

5.3.11.1.  State/Scale  Bus.  The  state/scale  bus  is  used  to  transmit  data  to 
the  WFTs  about  state  and  scale  information  as  well  as  receive  the  new  scale  information. 
The  state/scale  bus  consists  of  nine  lines  with  a  group  of  three  representing  the 
state/scale  bus  for  a  particular  WFT  pipeline  column.  The  data  flow  for  the  bus  is 
shown  on  Fig.  40.  The  three  main  signals  that  control  the  data  flow  are  LdState, 
LdScale,  and  DriveScale.  The  scale  information  flows  into  or  out  of  the  TSR.  DriveScale 
controls  the  t-gates  above  the  TSR  register  that  allow  the  register  to  drive  the  signals, 
and  LdScale  controls  the  t-gates  above  the  register  to  load  in  values  by-passing  the  C 
bus.  The  state  information  is  generated  from  two  places;  the  size  information  comes  from 
the  size  storage  cells  and  the  WFT  processor  watchdog  configuration  information  comes 
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Figure  40  Data  Flow  for  State/Scale  Bus 


from  the  CCR.  A  logic-1  tells  the  WFT  to  be  active,  and  a  logic-0  tells  it  to  be  a  watch¬ 
dog.  Since  the  columns  have  independent  state/scale  busses,  they  can  be  loaded  con¬ 
currently.  The  LR  1,  LR  2,  and  LR  3  signals  from  the  XROM  select  which  bit  from  the 
CCR  to  drive  onto  the  state/scale  bus.  These  signals  are  further  gated  by  LdState  so  as 
not  to  interfere  with  loading  or  receiving  of  the  scale  information  The  translation  of  the 


state/scale  bus  to  the  WFT  interface  for  each  column  is  as  follows: 

Table  14:  PFA-WFT  Interface  Translation 


PFA 

STl 

ST2 

ST3 

WFT 

WD/SCO 

SIZE! /SCI 

SIZE2/SC2 

5. S. 11.2.  Voting  Circuitry.  The  voting  cell  is  used  to  assign  an  error  to  one 
of  three  inputs  according  to  the  voting  strategy  described  in  chapter  3.  The  voting  cell 
has  several  components.  The  first  component  is  the  cell  called  3vole.  This  cell  actually 
implements  the  voting  strategy.  From  Karnaugh  maps  with  inputs  i,j,k,  the  following 
logic  equation  result  for  each  of  the  three  input  to  determine  error  assignment: 

error,  =  ijk  +  i  j  k, 
errorj  =  ijk  +  ijk, 
error *  =  ijk  +  ijk. 

Odc  particular  implementation  is  shown  in  Fig.  41(a).  The  problem  with  this  implemen¬ 
tation  is  uneven  capacitive  loading.  The  capacitance  loading  on  the  c  input  is  1.5  times 
that  of  b  and  much  larger  than  a.  Also  the  a  signal  must  travel  through  two  t-gates, 
whereas  b  ar  ’do  not.  To  reduce  this  imbalance,  thus  increasing  speed,  the  solution 
shown  in  Fig.  41(b)  was  chosen.  This  distributes  the  load  while  still  maintaining  the 
mutual  exclusion  needed  for  the  multiplexers.  It  does,  however,  increase  the  number  of  t- 
gates  needed  because  of  the  need  to  prohibit  floating  nodes.  The  cell  is  repeated  three 
times,  one  for  each  set  of  three  WFT  processors.  The  next  component  in  the  voting  cell  is 
the  column  error  generator  This  cell  looks  at  all  of  the  nine  error  lines  output  by  the 
3vote  cells  and  determines  which  columns,  if  any,  contain  errors  The  column  generator 
unit  is  made  up  of  three  3-input  OR  gates  The  OR  gates  are  made  by  a  3-input  NOR 
gate  followed  by  an  inverter.  The  column  error  signals  go  to  the  branch  circuitry.  The 
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(a)  Voting  Circuit 

(b)  Modified  Voting  Circuit 

Figure  41 

Voting  Implementation 

column  error  signals  are  used  to  reduce  the  number  of  lines  of  microcode  by  narrowing 
the  location  of  the  error  to  a  column.  The  third  component  is  a  set  of  t-gates,  which 
gates  the  signals  from  the  3vote  cells  to  the  stageup  cell  using  the  LdELR  signal  The 
stageup  cell  inverts  each  of  the  nine  signals  four  times  with  increasing  gate  size  in  a  ratio 
of  1:2:4  8  to  stage  up  the  signals  for  greater  current  drive  due  to  the  load  on  these  nine 
lines.  The  stageup  is  necessary  because  the  nine  error  lines  go  to  the  ERROR'7  cell, 
(described  later)  to  the  ELR,  and  the  branch  logic. 

5.3.11.3.  WFT  Processor  Loads.  The  load  signals  indicate  to  a  W’FT  proces¬ 
sor  that  state  information  is  being  loaded  This  information  is  loaded  when  configuring 
the  pipeline.  When  the  LOAD  line  to  a  WFT  processor  is  high,  the  information  is  loaded 
into  flip-flops  that  store  the  information  until  the  LOAD  line  becomes  high  again  Each 
WFT  processor  must  have  its  own  LOAD  signal  since  each  must  be  configured 
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Table  15.  WFT  DFT  Size  Determination 


SIZEO  SIZE2  Size  Translation 


5.3.11.0.  Done?.  The  DONE  signal  generated  by  this  cell  is  used  as  a  condi¬ 
tion  input  for  branching  until  the  input  host  and  the  three  WFTs  have  completed  their 
operations.  Each  of  the  input  DONE  signals  are  fed  into  a  NAND  gate,  the  output  of 
this  gate  is  inverted  three  times  to  produce  the  desired  AND  product  and  stage  up  the 
signal  to  travel  across  the  chip  to  the  branch  logic. 

5.3.11.7.  Error?.  The  ERROR?  cell  determines  if  an  error  occurred  during  a 
DFT  computation  and  also  generates  signals  isolating  the  error  to  a  set  of  input  bits. 
Figure  44  show  the  gate-level  description  of  the  circuit.  The  three  sets  of  inputs  come 


Figure  44  Gate-Level  Description  of  ERROR7  Cell 


from  the  Voting  Circuitry,  the  Parity  Error  pads,  and  the  Error  Code  Uncorrected  pads. 
Any  one  of  these  bits  being  high  invalidates  the  data  where  the  error  occurred.  The  logic 
’goal’  is  to  implement  an  OR  of  all  the  signals.  At  the  same  time,  the  area  of  the  error 
needs  to  be  identified,  therefore  a  simple  15-input  OR  gate  cannot  be  used.  Instead  a 
NOR  gate  is  used  for  each  set  of  inputs.  The  inverted  NOR,  making  an  OR,  is  used  to 
generate  to  error  flags  for  each  input  set.  The  outputs  of  the  NOR  are  fed  into  a  NAND 
gate  to  produce  the  ERROR  flag.  By  using  the  boolean  equations,  the  use  of  an  OR  gate 
in  CMOS  would  have  produced  two  levels  of  gates  since  an  OR  gate  is  produced  by 
inverting  the  output  of  a  NOR  gate.  The  boolean  transformation  from  the  input  to  the 
ERROR  signal  can  be  seen  as  follows: 


each  of  the  NORgates  produces. 


WDerr  =  WD\\err+WD  12err+  •  ■  •  +  WD32err+WD33err 
PEerr  =  PE  1  err+PE2err+PE  3err 


5.3.11.8.  Toggle  F/F.  The  toggle  flip 'flop  is  used  to  generate  the  FLIP  for 
the  memories  The  XROM  simply  indicates  to  the  circuitry  that  the  signal  needs  to  be 
changed.  This  way,  the  microcode  need  not  test  what  the  value  was  before  and  then 
change  flip  it.  The  circuit  needed  to  implement  this  must  toggle  its  output  every  time  the 
input  is  pulsed  from  the  XROM  This  kind  of  flip-flop  is  a  toggle  flip-flop  (TF/F).  The 
TF/F  chosen  for  implementation  is  described  in  Glasser  and  Dobberpohl  [Gla85\  This 
circuit,  shown  in  Fig  45.  operates  with  a  2-phase  clock  and  a  reset  The  reset  signal 
comes  from  the  global  reset  signal  for  the  PFA  controller  The  reset  signal  is  needed  to 
put  the  TF/F  into  a  deterministic  starting  state,  otherwise  the  feedback  loops  are 
undefined.  The  input  is  the  FLIP  bit  from  the  XROM,  this  bit  is  raised  for  one  clock 
cycle  before  each  DFT  computation  is  started.  The  FLIP  determines  which  side  of  the 
memory  is  written  to  (read  from). 

6.3.11.8.  Column  Done  Storing.  If  a  timeout  occurs,  it  will  be  necessary  to 
indicate  which  of  the  column  failed.  This  is  done  by  loading  the  DONE  signal  from  the 
three  WFT  columns  into  the  PSi  registers  at  the  same  time  the  output  scale  factors  are 
loaded  into  the  TSR.  This  way,  the  PS»  will  contain  information  as  to  whether  the 


Figure  45  Toggle  F lip- F lop  Gate-Level  Representation 


8. S.  11.5.  Si*e.  The  DFT  size  bits  are  handled  in  an  manner  similar  to  the 
initial  scale  factor  bits.  When  LOADSTUFF  is  high,  the  input  at  the  two  SIZE  pads  are 
loaded  into  two  register  cells.  The  DFT  size  bits  are  driven  to  the  state  scale  bus  when 
the  state  is  being  loaded  into  each  WFT  processor  with  the  LdState  signal  Each  WFT 
processor,  15,  16  and  17  each  receive  the  same  DFT  size.  The  size  determines  how  many 
words  the  WFT  processor  will  use  in  the  DFT  computation.  The  interface  to  the 
state/scale  bus  is  shown  in  Fig.  43.  The  reason  for  gating  each  of  the  size  bits  through 
LdState  even  though  this  same  signal  drives  the  output  of  the  storage  cells  is  so  that  the 
state/scale  lines  will  not  be  shorted  when  loading  scale  information.  The  following 
describes  the  meaning  of  the  size  bits  for  the  WFT16: 
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Figure  13  Size  Interface  to  State/Scale  Bus 


problem  is  still  valid  as  well  as  which  column  did  not  finish.  The  \VFT  done’s  are  directly 
loaded  into  the  PS»  registers  with  the  LdScale  signal  from  the  XROM. 


5.3.12.  Microcode  Development. 

The  microcode  is  developed  along  side  the  VLSI  design.  Design  tradeoffs  are  made 
between  the  microcode  and  the  hardware.  Routines  in  the  microcode  can  be  made  simpler 
by  increasing  the  hardware  complexity.  Tradeoffs  occur  when  the  complexity  of  the 
hardware  increases  more  rapidly  than  the  microcode  simplification. 

The  first  step  in  the  microcode  development  defines  the  microinstruction  fields.  The 
PFA  microinstruction  contains  10  fields  as  shown  in  Table  16. 


Table  16:  Microword  Format 


0 

ALL’ 

LCfc\D_FD 

ABUS 

BBUS 

CBUS 

SPBCJOJNCT 

WFTCP 

PFADONE 

NXT-ADCB 

0-8 

0-1 2 

18 

14-18 

18-28 

24-28 

28-41 

42 

48 

44-60 

The  BR_SEL  (branch  selection)  determines  the  branching  conditions  and  selections.  The 
ALU  field  determines  the  operation  of  the  ALU.  The  LOAD_FD  (load  field)  field  deter¬ 
mines  whether  the  \XT_ADDR  field  is  inserted  into  the  datapath  or  not.  The  ABl:S 
and  BBUS  fields  determine  which  register  is  driven  onto  the  A  bus  and  the  B  bus  respec¬ 
tively  The  CBUS  field  determines  which  register  is  loaded  from  the  Cbus.  The 
SPEC_FL'NCT  (special  functions)  field  is  a  horizontally  encoded  field  to  control  certain 
operations  on  the  chip.  The  WFTOP  field  is  used  to  start  the  WFT  processors.  The 
PFADONE  fields  is  used  to  signal  the  Host  that  a  DFT  computation  has  been  completed. 
The  WFTOP  and  PFADONE  fields  were  not  included  as  part  of  the  SPEC_FUNCT  field 
to  increase  code  readability  and  emphasize  their  importance.  NXT_ADDR  (next  address) 
is  the  final  field  in  the  word.  This  field  is  used  to  both  specify  the  branch  location  or  a 
literal  to  be  placed  on  the  C  bus. 


The  BR_SEL  field  is  connected  to  the  control  sequencer  and  broken  down  into  three 
subfields  as  shown  in  Table  17. 


Table  17:  BR_SEL  Field 


BR_SEL  Field 

CMS 

BR_ON 

NAF 

0 

. 

* 

3 

5 

• 

The  CMS  (conditional  mux  selector)  subfield,  enumerated  in  earlier  in  Table  5,  selects  the 
condition  bit  for  a  branch.  The  BR_ON  field  selects  the  condition  bit  when  this  field  is  a 
1,  and  the  complement  when  it  is  a  0.  This  way,  the  microcode  can  branch  on  the 
presense  or  absence  of  the  condition.  The  NAP  (next  address  field)  subfield  selects  the 
source  for  the  next  address.  The  NAF  field  is  enumerated  in  Table  6. 


The  ALU  field  consists  of  the  four  signals  a3-a0  that  are  connected  to  the  ALU.  The 
ALU  field  is  enumerated  in  Table  18  and  shown  below. 


Table  18:  ALU  Field 


ALU  Field 

a3 

a2 

al 

aO 

9 

10 

11 

12 

The  LOAD_FD  is  bit  13  in  the  microinstruction.  When  this  bit  is  a  1,  the  contents 
of  the  NXT_ADDR  field  is  driven  onto  the  C  bus.  From  the  C  bus.  it  can  be  loaded  into 
any  register.  Accordingly,  when  the  bit  is  0,  the  field  does  not  affect  the  C  bus. 

The  ABUS,  BBUS,  and  CBUS  select  one  of  the  28  registers  to  be  driven  or  loaded. 
Five  bits  are  needed  to  select  the  28  registers.  The  format  of  these  fields  is  shown  in 

Table  19. 
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Table  19.  Bus  Fields 


4 

0 

1 

2 

4 

23 

24 

25 

26 

□ 

28 

The  select  of  a  register  is  a  combination  of  the  5  address  bits  Each  fields  uses  the  same 
decoding  scheme  enumerated  in  Table  20. 

Table  20:  Register  Select  Translation 


Select 

Register 

Select 

Register 

nn  Biwi 

00000 

unused 

10000 

WD31 

00001 

EC1 

10001 

WD3. 

00010 

EC2 

10010 

WD33 

00011 

EC3 

10011 

TSR 

00100 

EUl 

10100 

PSR 

00101 

EU2 

10101 

unused 

00110 

EU3 

10110 

PS1 

00111 

PEI 

10111 

PS2 

01000 

PE2 

11000 

PS3 

01001 

PE3 

11001 

ELR 

01010 

11010 

KS9I 

01011 

WD12 

11011 

CCR 

01100 

WD13 

1 1 100 

TEMP 

01101 

WD21 

11101 

TOl'T 

oiuo 

WD22 

11110 

unusH 

01111 

WD23 

11111 

unused 

Table  21:  SPEC_FUNCT  Field 
SPEC_PUNCT  Field~ 


Bit 

Control  Signal 

29 

Flip 

30 

Ldlnit 

31 

LdScale 

32 

LdPSi 

33 

ShiftTSR 

34 

ShiftPSR 

35 

LdELR 

36 

HOSTCONTROL 

37 

LdRl 

38 

LdR2 

39 

LdR3 

40 

LdState 

41 

DriveScale 

The  VVFTOP  field  is  bit  42.  When  this  bit  is  high  the  WFT  processors  are  allowed 
to  compute.  Bit  43  is  the  PFADONE  field.  This  bit  is  raised  when  the  PFA  has  com¬ 
pleted  a  DFT  computation. 

Bits  44-59  specify  the  NXT_ADDR  (next  address)  field.  Sixteen  bits  are  needed 
because  this  field  interfaces  with  the  16-bit  C  bus.  Bits  51-59  are  connected  to  the  source 
multiplexer  in  the  control  sequencer  and  represents  the  branch  address. 

The  microcode  word  format  is  summarized  in  Appendix  G.  The  second  step  in 
microcode  development  takes  the  algorithm  flow  charts  developed  in  chapter  3  and 
translates  them  into  microcode  routines.  The  first  task  according  to  Fig.  3-3  is  initializa¬ 
tion  This  is  done  by  loading  all  the  registers  with  Os  as  shown  below: 

RESET:  LOAD  REG  REG  TEMP  #0000000000000000; 


LOAD 

LOAD 

LOAD 

LOAD 

LOAD 

LOAD 

LOAD 


REG  REG 
REG  REG 
REG  REG 
REG  REG 
REG  REG 
REG  REG 
REG  REG 


TEMP  #0000000000000000 
WDll  #0000000000000000: 
WD12  #0000000000000000; 
WD13  #0000000000000000; 
WD21  #0000000000000000; 
WD22  #0000000000000000; 
WD23  #0000000000000000; 


When  consecutive  microword  fields  use  the  same  field  from  the  translation  file,  the 
defaults  must  be  explicitly  defined  for  the  fields  before  the  one  being  used.  In  this  exam¬ 
ple,  the  third  register  field  was  used  but  not  the  first  two.  Therefore,  the  defaults  for  the 
first  two  must  be  set. 

The  next  step  is  to  wait  for  the  Host  to  give  the  PFA  the  PFAOP  signal.  During 
this  time,  the  Host  is  allowed  to  examine  and  alter  the  register  contents.  The  loop  for 
this  is  shown  below. 

WAITGO:  JnOP  HostCntl  WAITGO; 

HostCntl; 

This  loop  illustrates  several  important  points  about  the  microcode  and  the  PFA  con¬ 
troller.  First,  the  use  of  label  is  shown.  WAITGO  is  the  name  of  the  loop  and  the  line 
number  is  substituted  for  WAITGO  in  the  microinstruction.  Second,  the  use  of  default 
values  makes  the  microcode  more  readable.  If  default  values  were  not  permitted  the  same 
two  lines  would  be  coded  as  shown  below. 

WAITGO  JnOP  ALU  LOAD_FD  REG  REG  REG  HostCntl  WFTOP  PFADONE  WAITGO, 

BR_SEL  ALU  LOAD_FD  REG  REG  REG  HostCntl  WFTOP  PFADONE  NXT_ADDR 

Third,  the  pipeline  requires  the  instruction  following  a  branch  to  cause  no  undesirable 
side  effects.  Since  the  fetching  of  the  XROM  microinstruction  is  pipelined,  there  is  a  one 
instruction  delay  before  a  branch  occurs.  In  this  instance,  a  nop  is  not  needed  because 
HostCntl  should  be  high  until  the  branch  occurs. 

After  PFAoperate  is  received  the  controller  compares  the  CCR  and  the  NCR.  If 
these  are  different,  a  new  configuration  was  requested.  The  controller  then  moves  the 
contents  of  the  NCR  to  the  CCR  and  loads  the  configuration  data  via  the  state/scale  bus 
to  the  WFT  processors.  The  controller  then  toggles  the  LEFT  signal  to  the  WFT  proces- 


sors,  drives  the  scale  factors  from  the  TSR  to  the  WFT  processors  via  the  state/scale 
bus,  and  raises  WFTOP. 


The  controller  now  waits  for  all  four  DONE  signals  from  the  input  host  and  the 
three  WFT  processors.  The  controller  will  only  wait  a  predetermined  time  for  the  proces¬ 
sors  to  finish.  This  time  is  stored  in  the  TOUT  register.  While  the  controller  waits,  it 
increments  the  TEMP  register  and  compares  it  to  the  TOUT  register.  When  the  two 
register  are  equal,  a  time  out  has  occurred  The  host  will  detect  this  in  the  PSi  registers 
because  the  done  bit  will  not  be  set.  The  code  implementing  this  is  shown  below. 

WAITDONE:  INC  TEMP  REG  TEMP  WFTop, 

CMP  TEMP  TOUT  WFTop; 

JZ  WFTop  SCALE; 

WFTop; 

Jn4DN  WFTop  WAITDONE; 

WFTop; 


When  the  controller  exits  the  WAITDONE  loop,  it  latches  the  scale  factors  from  the 
processors  into  the  TSR,  drops  the  WFTOP  signal,  and  checks  for  errors.  The  reasons  for 
many  of  the  condition  inputs  will  now  be  explained. 

To  save  time  in  the  error  routine,  the  controller  isolates  the  error(s)  to  a  specific  set 
of  inputs.  A  sample  of  the  error  routine  below  illustrates  this. 


WDREGS: 

JnWD  ErrPE; 

OR  TEMP  PS3  PS3; 
JnECol  INCOL2; 

WD_11: 

nop; 

Jnll  WD_12; 

WD_12: 

nop; 

INC  WDll  REG  WDll; 
Jnl2  WD_13; 

WD.13: 

nop; 

INC  WD12  REG  WD12; 
Jnl3  INCOL2; 

nop; 

INC  WD13  REG  WD13; 

INCOL2: 


JnECo2  INC0L3; 


ErrPE:  JnPE  ECCU; 

nop; 

The  controller  first  checks  if  the  error  occurred  in  the  watchdog  processors.  If  the  error 
was  not  in  the  watchdogs  then  it  skips  to  the  segment  for  Parity  Error.  Within  the  seg¬ 
ment  for  watchdog  errors,  the  controller  first  narrows  the  error  to  a  column  and  then  to 
a  specific  processor  This  same  approach  is  used  for  all  the  error  signals. 

The  controller  then  determines  if  an  active  processor  faulted  by  comparing  the  CCR 
and  the  ELR  If  an  active  was  at  fault,  the  pipeline  must  be  reconfigured  The  controller 
looks  at  the  watchdog  error  counts  for  the  two  current  watchdogs  and  assigns  the  one 
with  the  lowest  error  count  to  be  the  next  active  Once  the  controller  has  set  up  the  new 
configuration  in  the  NCR.  it  will  be  different  than  the  CCR  and  RECONFIGURE  will  be 
called  when  the  controller  starts  the  next  problem  The  controller  now  sends  the  PFA- 
DO.XE  signal  to  the  host  and  waits  for  PFAOP 

5.3.13.  PFA  Controller  Summary, 

This  chapter  has  described  the  VLSI  design  for  the  memory  and  the  VLSI  design 
and  microcode  development  for  the  PFA  controller  The  PFA  controller  consists  of 
several  major  functional  units  and  interfaces  The  high  level  interaction  of  these  units 
can  be  better  appreciated  in  Fig  46  This  figure  shows  the  major  parts  of  the  controller 
and  their  approximate  location  on  the  chip. 


CHAPTER  6 


Results 


6.1.  Results 

This  thesis  effort  has  produced  a  prototype  memory  chip,  the  layout  for  the  full 
memory  chip  and  the  PFA  contoller,  and  a  generic  microcode  assembler. 

8.1.1.  Memory  Chips.  A  prototype  memory  chip  was  design  and  fabricated 
It  was  designed  to  test  the  address  decoders,  the  memory  cell,  the  sense  amplifiers,  and 
the  bit  line  logic.  The  chip  was  fabricated  in  28-pin  package  using  3  micron  CMOS  pro¬ 
cess  through  MOSIS  (MOS  Implementation  Service).  A  photomicrograph  of  the  fabri¬ 
cated  chip  is  shown  in  Fig  47  The  chip  contains  32  words  with  10  bits  each  One  of  the 
bits  in  each  word  was  used  for  the  dummy  bitline  and  one  other  bit  was  unused. 

A  larger  memory  chip  was  also  designed  and  submitted  for  fabrication.  This  chip 
was  designed  to  store  272  words  by  24  bits  This  is  the  size  memory  needed  for  the  a  pro¬ 
totype  PFA  pipeline  using  a  WTT16  and  a  WFT17  processor.  The  larger  chip  also  con¬ 
tains  all  the  circuitry  to  support  the  error  correction  and  detection.  A  cifplot  of  this  chip 
is  shown  in  Fig  48.  The  chip  is  7900  microns  by  9200  microns 

6.1.2.  PFA  Chip.  A  fully  functional  PFA  controller  was  designed  and  sub¬ 
mitted  for  fabrication.  The  chip  will  be  7900  microns  by  9200  micron  and  sits  in  an  84- 
pin  package.  The  chip  contains  over  23900  transistors.  A  cifplot  of  the  chip  is  shown  in 
Fig.  49.  Prior  to  submission,  the  chip  was  fully  simulated  using  Esim  and  the  design  was 


verified. 


T 


Decoding  Circuitry 


0.1.8.  Generic  Microcode  Assembler  Tool.  A  CAD  tool  was  developed  that 
takes  a  microcode  word  format  and  a  mnemonic  translation  file  and  builds  a  customized 
microcode  assembler.  The  assembler  uses  the  translation  file  to  generate  a  listing  file,  a 
reference  file,  and  a  file  of  the  translated  microcode.  Optionally,  the  assembler  will  pro¬ 
duce  an  output  of  the  microcode  suitable  for  input  to  the  optimizing  XROM  compiler,  a 
file  reverse  compiled  form  the  translated  microcode,  or  a  VHDL  description  of  the 
XROM 

GMAT  was  also  used  in  two  other  thesis  efforts  and  in  a  class  taught  af  AFIT  in 
the  Fall  term.  Capt.  Dave  Gallagher  used  GMAT  for  his  microcode  for  application 
specific  processors  [Gal87]  and  Capt.  Larry  Shand  used  GMAT  on  a  microcode  descrip¬ 
tion  on  a  Kalman  filter  chip  to  generate  a  VHDL  description  [Sha87j. 

Capt.  Gallagh  used  a  preliminary  version  of  GMAT  where  much  of  the  information 
needed  by  GMAT  was  entered  interactively.  This  process  was  tedious  and  very  error 
prone.  After  the  initial  assembler  was  created,  GMAT  was  no  longer  used.  Instead, 
alterations  to  the  assembler  were  manually  inserted  by  the  author. 

Capt.  Shand  used  the  final  version  of  GMAT  for  the  Kalman  filter  application. 
With  this  version,  GMAT  extracted  all  information  from  the  translation  file  and  no  data 
was  entered  interactively. 

The  students  in  the  Introduction  to  Computer  Architecture  class  used  GMAT  in  the 
completion  of  their  group  projects. 
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CHAPTER  7 


Conclusions  and  Recommendations 


7.1.  Conclusions 

This  thesis  has  shown  that  an  application  specific  processor  can  be  designed  within 
four  months  With  the  state  of  the  CAD  tool  set  at  the  current  time,  it  should  be  possi¬ 
ble  to  design,  submit,  and  receive  application  specific  processors  within  one  thesis  cycle. 
Efforts  due  to  this  thesis  and  the  thesis  by  Capt.  David  Gallagher  have  increased  the 
CAD  tool  set  such  that  this  can  be  done  It  is  now  possible  to  have  an  ALU  that  will  be 
useful  in  almost  any  processor,  a  control  section  that  is  correctly  designed,  an  optimized 
XROM,  and  a  microcode  assembler.  Each  of  these  improvements  has  special  importance 
and  decreases  the  layout  time.  This  thesis  used  the  predesigned  ALU,  control  sequencer, 
and  XROM  However,  the  design  of  the  ALU  was  still  being  done  and  the  control 
sequencer  had  never  been  completely  debugged.  The  XROM  functionality  had  already 
been  proven.  By  using  pre-designed  cells,  the  designer  of  the  processor,  can  spend  more 
time  on  other  areas  of  design  including  testability,  controlability,  and  observability.  The 
designer  will  also  be  able  to  spend  more  time  simulating  the  circuits  in  both  spice,  for 
timing  analysis,  and  Esim,  for  functional  analysis.  All  of  this  increases  the  probability 
that  the  chip  will  function  properly  when  fabricated. 

The  microcode  assembler  is  a  very  useful  tool  in  several  areas.  First,  the  microcode 
can  be  written  in  a  form  that  is  easy  to  read  and  less  prone  to  errors.  Once  this  micro¬ 
code  has  been  written  and  debugged,  it  is  no  longer  necessary  for  the  writer  to  translate 
the  microcode  into  an  integer  format  for  the  XROM  optimizer.  This  greatly  reduces  the 


107 


r*  v*  \r*  v*  v*  \r*  \r*rxw*  \r*  w  vr  \r»  v»  ir-.  it*  r^i r*  r^imjr^v 


time  needed  to  generate  an  XROM  and  the  opportunity  for  human  error.  Secondly,  the 
reference  file  containing  the  microword  along  with  its  translations  is  a  useful  debugging 
tool.  One  can  observe  the  intended  instruction,  the  XROM  outputs,  and  the  connections 
to  the  XROM.  This  lets  the  designer  verify  all  connections  to  the  XROM  and  the  XROM 
itself  Thirdly,  the  VHDL  output  of  the  XROM  will  be  useful  when  chip  level  verification 
can  be  done  using  VHDL  The  VHDL  environment  at  the  current  time  does  not  support 
easy  simulation  of  VLSI  chips  However,  when  it  does  this  tool  will  already  support  an 
XROM  description 

7.2.  Recommendations 

Several  areas  in  the  Prime  Factor  Algorithm  and  in  the  CAD  arena  still  need  to  be 
addressed  In  the  PFA  project,  the  fully  functional  memory  chip  and  the  PFA  controller 
must  be  tested.  Additionally,  the  chips  used  for  clocking  need  to  be  developed.  These 
chips  need  to  be  carefully  designed  to  meet  the  requirements  of  the  pipeline.  The  chips 
should  be  very  powerful  and  able  to  drive  the  large  currents  needed  by  the  \VFTs  and 
capacitances  associated  with  it.  The  WFT15  and  WFT17  need  to  be  designed.  Although 
they  are  just  modifications  of  the  WFT16,  the  time  needed  for  layout  and  simulation  will 
take  approximately  one  man  year  for  both.  The  work  involved  in  these  design  will  be 
intensive,  but  will  not  be  suitable  thesis  material.  The  design  and  implementation  of 
these  chips  could  be  done  by  a  staff  engineer.  Finally,  the  prototype  PFA-WFT  pipeline 
will  need  to  be  implemented  and  tested.  The  prototype  includes  a  WFT16  followed  by  a 
WFT15  in  a  two-stage  pipeline. 

In  the  CAD  arena  more  areas  still  need  to  be  developed  for  streamlined  design  of 
application  processors.  The  area  most  lacking  in  tools  is  simulation.  Esim  is  the  only  tool 
really  used  to  verify  chip  design.  This  will  simulate  at  a  switch  level  but  higher  level 
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simulation  is  needed.  Two  areas  of  research  are  currently  being  developed  at  AFIT.  These 
are  the  STOVE  (sim  to  VHDL  extraction)  project  and  the  ongoing  VHDL  theses.  The 
STOVE  project  is  attempting  to  extract  chip  at  a  gate-level  representation.  Currently,  it 
extracts  inverters,  clocked  inverters,  and  t-gates.  The  PFA  controller  was  extracted  at 
this  level  and  produced  approximately  12000  lines  of  VHDL. 

The  current  state  of  the  VHDL  environment  does  not  support  chip  level  simulation. 
When  it  does,  however,  this  will  become  an  important  step  is  VLSI  design.  Chips  will  be 
designed  first  at  the  VHDL  level,  the  layout  will  be  done,  then  the  chip  will  be  extracted 
back  to  a  VHDL  description  and  then  compared  to  the  original  VHDL  description.  The 
tool  to  complete  the  design  loop  will  be  able  to  take  the  original  VHDL  description  and 
compile  it  into  silicon.  This  decreases  design  time  and  eliminates  human  error  at  the  lay¬ 
out  design  level. 

This  thesis  effort,  along  with  the  thesis  effort  of  Capt.  David  Gallagher  [Gal87],  has 
shown  that  it  is  possible  to  generate  a  complete  application  specific  processor  within  one 
thesis  cycle.  This  could  be  very  important  to  the  Air  Force  and  the  DoD,  as  well  as 
AFIT.  The  rapid  development  of  VLSI  chips  will  decrease  the  time  need  to  insert  VLSI 
technology  into  existing  systems.  All  the  design  methodologies  associated  with  applica¬ 
tion  specific  processors  can  be  applied  to  the  design  to  VHSIC  systems  as  well.  The 
AFIT  VLSI  environment  could  be  developed  so  that  high  quality,  fast  turnaround  appli¬ 
cation  specific  processors  could  be  produced  and  tested  within  one  year.  To  encourage 
this  development  several  areas  could  be  explored.  First,  AFIT  could  be  designated  as  an 
Air  Force  "center  of  excellence."  This  would  establish  AFIT  as  an  identified  program  and 
allow  more  resources  to  be  dedicated  to  VLSI  design.  Resources  are  the  second  area.  A  set 
of  hardware  could  be  dedicated  just  for  VLSI  design.  This  should  include  at  least  one 


Sun  Workstation  and  a  superminicomputer  such  as  an  ELXSI  6400.  Third,  a  civilian  staff 
of  at  least  two  people  could  be  dedicated  to  the  VLSI  design  teams.  One  person  would  be 
responsible  for  maintaining  the  CAD  tools,  systems,  and  general  configuration  manage¬ 
ment.  The  second  person  could  be  a  design  engineer  acquainted  with  the  CAD  tools  and 
the  cell  libraries  able  to  integrate  the  design  into  silicon  rapidly.  With  these  recommenda¬ 
tions,  AFIT  could  be  a  leader  in  the  field  of  VLSI/VHSIC  insertion 
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Corresponding  parity  generator  matrix  equations 


t’O  =  U0  +  U7  +“8  +U  14  +ti  16  +  U  21 
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t>2  =  Uj+U2+u8+u10  +  u  I8+U23 
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l’ 4  =  U3 +U  4 +U  10+ti  jo4  1/ 17 +ti  20 
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t'g  —  It  5  +  Ug  +U  12  +  U  14  4"U  jg  +ti  22 
Ug  +  U7+U]3  +  U15  +  U20  +  U23 


t'7  = 
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Corresponding  syndrome  generator  equations: 


*0  =  r0+r8'rrlS+r16  +  r22+r2<+r29 
al  ~  rl+r8+r9+r17+r23  +  r2S_,_r30 
«2  =  r2+r9+r10+r16"*"r18+r28+r31 
*3  —  »'3+rio+711 -fri7+ris+r24+r27 


«4  =  r4+ru -fri2+rig+roo-t-r25  +  r2g 
sb  =  rS+r  12+r13 +r  19 +r21 +r28  +  r29 
s8  =  r6+r13+r14+r20  +  r22't"r27+r30 
•S?  =  r7+r14+r15+r21  +  r23  +  r28+r31  Sp3 


» 


& 

$ 

>3 


/H 


Corresponding  error  bit  equations: 

e0  =  S1  s2's3'si'sl>’s6'81 


e16  =  s0'srs2's3's4's5  56'57 
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APPENDIX  B 


GMAT  Shell  Script 


# !  /bin/sh 

# 

#  DATE:  1  DEC  1987 

#  Version:  1.0 

# 

#  NAME:  gmat 

#  DESCRIPTION: 

#  This  shell  script  run  the  Generic  Microcode  Assembler  Tool  (GMAT) 

#  It  first  calls  preg  to  parse  the  translation  file  and  write  out 

#  the  tailored  C  code.  If  the  call  to  preg  was  successful,  gmat 

#  copies  assem.c  ir.i_o  the  directory  and  compiles  the  assembler. 

#  After  compilation,  gmat  removes  assem.c. 

# 

#  AUTHOR:  Robert  S.  Hauser 

#  HISTORY: 

# 

echo  Running  preg 

if  /eng/87d/rhauser/bin/preg  $1 

then 

echo  Copying  library  source  code  into  this  directory 
cp  /eng/87d/rhauser/bin/assem. c  assem.c; 
echo  Compiling  your  assembler 
cc  -0  -o  assem  assem.c  -lm; 
echo  Removing  library  source  code 
rm  assem.c 
else 

echo  Error  in  running  preg. 
echo 


APPENDIX  C 


PREG  C  Code 


/******************************■************************************* 
★ 

*  DATE:  1  DEC  1987 

*  Version:  1.0 

* 

*  TITLE:  Pre  Gmat  routine 

*  FILENAME:  PREG.C 

*  COORDINATOR:  Capt  R  W  Linderman 

*  PROJECT:  Generic  Microcode  Assembler  Tool  (GMAT) 

*  OPERATING  SYSTEM:  UNIX  4 . 3BSD 

*  LANGUAGE:  C 

*  CONTENTS: 

*  get_answer( ) 

*  get_names ( ) 

*  build_assem( ) 

*  get_micro_format( ) 

*  scan_t_file( ) 

*  update_blengths ( ) 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

************************************************** 

#include  <stdio.h> 

# define  TRUE  1 
# define  FALSE  0 

struct  WD_FIELD_ENTRY{ 
char  name[1000],- 
int  literal ; 
int  label; 
int  blength; 

In¬ 
struct  T_ENTRY { 

char  name[1000]; 
int  blength; 

In¬ 
struct  WD_F I ELD_ENTR Y  fields [1000] ; 
struct  T_ENTRY  t_fields [1000]  ; 

FILE  *tranfile, *gmathead; 
char  word [1000] , answer [2] ; 
int  num_sub_fields; 
int  num_wd_fields,num_t_fields; 
int  bits_in_word; 
int  ok ; 


int  i,k,j; 

char  wordend [1000] ; 

char  field_name_tmp[1000] ; 

char  empty [ ] -  "  " ; 

char  fill [1000]; 

O/'  int  ch; 

main(argc,argv)  int  argc;char  **argv; 

[ 

if  (argc  !-  2) 

{ 

printf ( "\nUsage:  preg  translation_file_name\n\n" ) ; 
exit { 1 ) ; 

} 

if  ((tranfile  -  fopen(argv[l] , "r" ) )  --  NULL) 


{ 

printf ( "\nFile  %s  could  not  be  found. \n\n " ,argv[l] ) ; 
exit( 1) ; 

} 

get_micro_f ormat ( ) ; 
printf ( "\n" ) ; 
scan_t_file( ) ; 
update_blengths ( )  ; 
build_assem(argv[l] ) ; 

}/*end  main*/ 


N 


y'*  ****************************************************************** 

*  DATE:  1  DEC  1987 

*  Version:  1.0 

*  PROCEDURE:  get_names() 

*  DESCRIPTION: 

*  Reads  translation  file  and  pulls  out  the  field  name 


*  PASSED  VARIABLES: 

*  RETURNS: 

*  t_field_name  :  name  of  field 
t_field_blength  :  field  length 


*  GLOBAL  VARIABLES  USED: 

*  EOF 

*  ch 

*  tranfile 

*  num_sub_fields 

*  GLOBAL  VARIABLES  CHANGED: 

*  ch 

*  tranfile 

*  num_sub_f ields 

*  FILES  READ: 

*  tranfile 

*  FILES  WRITTEN: 

*  MODULES  CALLED: 

*  CALLING  MODULES: 

*  scan_t_file( ) 


*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

C-2 


get_names  ( t_f ield_name ,  t_f  ield_blength) 
char  t_field_name[] ; 
int  *t_field_blength; 

{ 

char  t_field_value[100] ; 
num_sub_f ields  =  0; 
for( ; ; ) 

{ 

k  -  j  -  0; 

ch  -  fgetc(tranfile) ; 

while  (ch  —  '  ' )  ch  -  fgetc( tranfile) ;  /*  ignore  leading  blanks  */ 

if  (ch  --  ' \n ' | |  ch  --  EOF)  break; 

/*  if  at  end  of  blank  line  or  file  quit*/ 
while  (ch  !-  ’\n'  &&  ch  !-  '  ' )/*  until  EOL  or  blank  */ 

{ 

if  (num_sub_f ields  -=  0)  t_f ield_name [k++]  -  ch; 
ch  =  fgetc ( tranfile ) ; 

} 

if  (num_sub_f ields  —  0)  t_field_naune [k]  -  '\0'; 
num_sub_f ields++ ; 

while  (ch  *«  '  ' )  ch  -  fgetc( tranfile) ;  /*  skip  blanks  */ 
while  (ch  !=  '\n'  &&  ch  !-  '  ') 

{ 

if  (num_sub_fields  —  1)  t_field_value[ j++]  -  ch; 
ch  *=  fgetc( tranfile)  ; 

} 

if  (num_sub_fields  —  1)  *t_field_blength  -  j; 

/*  ignore  trailing  blanks  */ 

while  (ch  —  ’  ' )  ch  -  fgetc( tranfile) ;  /*  ignore  trailing  blanks  */ 

} 

} 

****************************************************************** 

*  DATE:  1  DEC  1987 

*  Version :  1.0 

*  PROCEDURE:  build_assem( ) 

*  DESCRIPTION: 

*  Makes  read_trans_table( ) ,  translate(),  make_reffile( ) 

* 

*  PASSED  VARIABLES: 

*  filename  :  name  of  translation  file 

*  RETURNS: 

*  GLOBAL  VARIABLES  USED: 

*  bits_in_word 

*  num_wd_fields 

*  GLOBAL  VARIABLES  CHANGED: 

*  FILES  READ: 

*  FILES  WRITTEN: 

*  assem.h 

*  assem. tailored 

*  MODULES  CALLED: 

*  CALLING  MODULES: 

*  main( ) 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 


build_assem( filename)  char  filename[]; 

{ 

int  i; 

FILE  *aheader , *atailored/ 
aheader  -  f open ( " assem . h" , " w" ) ; 

fprintf (aheader, "#define  BitsInWord  %d\n",bits_in_word); 
for  (i-0,-i<num_t_f  ields  ,-i++) 

fprintf (aheader, "struct  SYMBOL_ENTRY  %stbl [MaxSubFields] ;\n" , 

t_fields[i].name); 

for (i-0;i<num_t_f ields ;i++) 

fprintf (aheader, "int  num_fields%s;\n" , t_fields [i] .name) ; 
atailored  -  fopen( "assem. tailored" , "w" ) ; 

/********************  make  read_trans_table ()  ******************/ 
fprintf (atailored, "/*************** ******************/\n» j . 
fprintf (atailored, "read_trans_table( )\n[\n" ) ; 
fprintf  (atailored,  "  char  fill  [1000]  ,  filll  [1000]  ,-\n"  )  ,• 

fprintf (atailored, "  symbolfile  -  fopen(\"%s\" ,\"r\" ) ; 

\n" , filename) ; 

fprintf ( atailored, "  f scanf ( symbol file,\"%% [^\\n] %% 


[\\  \\n] \", fill, filll );\n"); 

for ( i=0 ; i<num_t_fields ; i++ ) 

fprintf (atailored, "  readin( %stbl , &num_fields%s) ;\n" 

, t_fielc3s [i]  . name, t_fields [ i]  .name)  ,- 
fprintf (atailored, "  fclose( symbol file) ;\n) 

/*  end  read_trans_table  */"  )  ,- 

/*************************  make  translate ( )  ********************/ 
fprintf ( atailored, "\ntranslate( )\n[\n  " ) ; 

fprintf (atailored, "stripped  -  fopen(strip_file,\"r\" ) ;\n" ) ; 
fprintf (atailored, "transfile  -  fopen(trans_file,\"w\" ) ;\n” ) ; 
fprintf (atailored, "fscanf (stripped, \"%%\s\" , input) ;\n" ) ; 
fprintf ( atailored, "while ( strcmp( END, input) !-0)\n[\n" ) ; 


for ( i=0 ; i <num_wd_f ields ; i++ ) 

{ 

if  ( fields [i] . literal  --  TRUE) 

{ 

fprintf (atailored, "  if  ( literal (input )-■ -TRUE)  \n")/ 
fprintf ( atailored , "  fscanf (stripped, \"%%\s\" 

, input) ; \n " ) ; 

fprintf ( atailored , "  else\n  " ) ; 

} 

if  ( fields [i] . label  --  TRUE) 

( 

fprintf (aheader, "int  lab_b_length  -  %d;\n",fields[i].blength); 
fprintf (atailored, "  if  ( symtrans( Label tbl , 

input ,  index_to_labels) -*=TRUE)  \n" )  ; 
fprintf (atailored, "  fscanf (stripped, \"%%\s\" ^input) ;\n" ) ; 

fprintf (atailored, "  else\n  " ) ; 

} 

fprintf (atailored, "if ( symtrans( %stbl , input , num_fields%s ) --TRUE) 
fscanf ( stripped,\"%%\s\" , input ) ;\n" ,  f ields [i] . name, fields [ i] . name) ; 

}  /*  end  for  num_wd_f ields  */ 
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f print f ( atailored , "\nif ( strcmp ( NOP , input )  --  FALSE) 

fscanf (stripped ,\" %%\s\" , input) ;\n” ) ; 
fprintf  (atailored,  "\nif( input  [0]  — '  +  ' )  fprintf  (transfile,\"  +\\n\"  )  An" 
f print f ( atailored , "\nelse\nprint f { \"\\nERR0R :  symbol 

>%%\s<  not  defined\\n\" , input) An’); 


fprintf ( atailored , 
fprintf ( atailored , 


"  fscanf (stripped ,\" %%\s\" , input) An" ) ; 
”\n}\nfclose(transfile) ; 

\nfclose( stripped) An}/*  end  translate*/\n  ); 


) 


/*************  make  make_ref file( )  *********************/ 
fprintf (atailored, "\nmake_ref file ( )\n{\n" ) ; 
fprintf (atailored, "  char  " ) ; 
for  ( i-0  ,•  i<num_wd_f  ields  ,•  i++ ) 

if  (i!-0)  fprintf (atailored, ",") ; 

fprintf ( atailored , " t%d [ %d+l ] " , i , fields [ i } . blength ) ; 

} 


fprintf (atailored, " Anlistingfile  -  fopen(l_file A"r\" ) ; 
fprintf (atailored, "\ntransfile  -  fopen(trans_file,\"r\  ); 
fprintf (atailored, "\nreffile  =  fopen(r_file ,\"w\" ) ; " ) ; 
fprintf ( atailored ,"\nf scanf (listingfile ,\" %%\[\  AJ  A\n\ 
fprintf (atailored, "\nwhile( strcmp (END, line) !-0)\n(\n" ) ; 
fprintf ( atailored, ”\n  strcat ( line, E0L2 ) ; " ) ; 


)  f 
" ) : 


, line)  ; 


) ; 


fprintf  (atailored,  "\n  fprintf  (reffile  ,\"%%-50s\"  ,  line) ,-  "  ) ; 
fprintf  (atailored,  "\n  fscanf  ( transfile  ,\"%%\[\~+\]  +\\n\"  ,  linel) ; "  )  ,- 
fprintf  (atailored,  "\n  sscanf  (linel  ,\""  )  ,- 
for ( i-0 ; i<num_wd_f ields; i++ ) 

{ 

fprintf (atailored, "%%%ds"  , fields l i] .blength); 

} 

fprintf ( atailored, ”\” ,”); 
for (i«0;i<num_wd_f ields ;i++) 

{ 

if  (i!-0)  fprintf(atailored,”,”); 
fprintf (atailored, ”t%d  ”,i); 

} 

fprintf ( atailored ,”);”) ; 

fprintf (atailored, "\n  fprintf (ref file ,\”\\n" ) ; 
for ( i-0 ; i <num_wd_f ields ; i++ ) 

{ 

fprintf  (atailored,  "%%s  "  )  ,- 

} 

fprintf (atailored, "\\n\" , " ) ; 
f or ( i-0 ; i <num_wd_f ields ; i++ ) 

{ 

if  (i!-0)  fprintf (atailored, ",") ; 
fprintf ( atailored , " t%d" , i ) ; 

} 

fprintf (atailored, " ) ;" ) ; 

fprintf ( atailored, "\nf scanf ( listingfile ,  \"  %%\ 

.  .  [\^;\]  A\n\" ,  line)  ,-\n}\n" )  ; 

fonntf  (atailored,  "fprintf  (ref file, \" end ;\\n\"  ) ;"  )  ; 

fprintf  (atailored,  "\nfclose(  transfile) )  ; 

fprintf (atailored, "\nfclose(ref file) ) ; 

fprintf  (atailored,  "\nfclose(  listingfile) ; " )  ,• 

fprintf  (atailored,  "\n)V***  end  make_reffile  ****V\n"); 


/*********  make  reverse_comp ()  ***********/ 

fprintf (atailored, "\nreverse_comp( )\n{\n  " ) ; 

fprintf (atailored, "revfile  -  fopen(  r_f  ile,\"w\"  )  ;\n"  )  ,• 
fprintf (atailored, "transfile  -  fopen( trans_file, \"r\" ) ;\n" ) ; 


fprintf  (atailored,  "for(i-0,-i<line_num-l;i++)\n  {\n" ) ,• 
fprintf (atailored, "fprintf (revfile, \"%%0. 6d  \" , i ) ; " ) ; 

for  ( i-0  ,-  i<num_wd_f  ields ;  i++ ) 

{ 

fprintf (atailored, "fscanf (transfile, \"%%%ds\" , input) ; 

\n"  ,  fields  [i]  .blength)  ,• 

if  ( fields [i] .label  --  TRUE) 

{ 

fprintf (atailored, "if  (revtrans( %stbl , input , num_fields%s) 

-■FALSE)  ,-\n"  ,  fields  [i]  .  name,  fields  [i]  .  name)  ,- 
fprintf (atailored, " (\nif  (convert (input)  !-  0)\n")/ 
fprintf ( atailored , " fprintf ( revfile , \"  %%\d ( %%\s )\" 

, convert ( input ),  input)  ,-\n}\n" )  ; 

} 

else 

t 

fprintf  ( atailored ,  "revtrans  ( %stbl ,  input ,  num_f ields%s )  ,-\n" 

,  fields [i]  .name,  fields [i]  .name)  ,- 

} 

}  /*  end  for  num_wd_fields  */ 
fprintf (atailored, "  fscanf (transfile, \"%%\s\" , input) ,\n" ) ; 


fprintf (atailored, "  fprintf (revfile, \"\\n\" 
fprintf (atailored, "\n}\nfclose( transfile) ; 

\nfclose( revfile) ;\n}/* 


end  reverse  comp*/\n" ) ; 


j/******  end  build  assembly  ()  ******/  ***** 

*  DATE:  1  DEC  1987 

*  Version:  1.0 


*  PROCEDURE:  get_micro_format( ) 

*  DESCRIPTION:  ,  _  , 

*  Reads  the  first  line  of  the  translation  file  and 

*  pulls  out  the  word  format 


* 


*  PASSED  VARIABLES: 

*  RETURNS: 

*  GLOBAL  VARIABLES  USED: 

*  word 

*  fill 

*  wordend 

*  field_name_tmp 

*  fields 

*  empty 

*  GLOBAL  VARIABLES  CHANGED: 

*  word 

*  fill 

*  wordend 

*  f i e 1 d_n ame_tmp 

*  fields 

*  empty 


06 


tranfile 


*  FILES  READ: 

* 

*  FILES  WRITTEN: 

*  MODULES  CALLED: 

*  CALLING  MODULES: 

*  main( ) 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

******************  a************************************************/ 

get_micro_format( ) 
t 

fscanf( tranfile, "% [  ;];%[\  \n] ", word, fill ) ; 
printf ( "Microword  format:  %s  ",word); 
num_wd_f ields-0  ,- 
do 
{ 

strcpy ( wordend , empty ) ; 

sscanf (word, "%s% [\  ]%[“;]", field_name_tmp, fill , wordend) ,■ 
strcpy ( word , wordend ) ; 

if  ( index ( field_name_tmp, '#') )  fields [num_wd_fields] . literal-TRUE 
if  ( index (field_name_tmp,^: ' ) )  fields [num_wd_fields] . label-TRUE; 
sscanf  (  field_name_tmp,  "%  [~#:  ]  "  ,  fields  [num_wd_fields++]  .name)  ; 

}/*  for  num  fields  */ 
whi le  ( s trcmp ( empty , word ) ! - 0  ) ; 

}/****  end  get_micro_format  ******/ 

y******************************************************************* 

*  DATE:  1  DEC  1987 

*  Version:  1.0 

*  PROCEDURE:  scan_t_file( ) 


*  DESCRIPTION:  „  ,,  .. 

*  Reads  the  translation  file  and  pulls  out  the 

*  field  names  and  put  them  in  t_fields 


*  PASSED  VARIABLES: 

*  RETURNS: 

*  GLOBAL  VARIABLES  USED: 

*  t_fields 

*  GLOBAL  VARIABLES  CHANGED: 

*  t_fields 

*  FILES  READ: 

*  FILES  WRITTEN: 

*  MODULES  CALLED: 

*  get_names ( ) 

*  CALLING  MODULES: 

*  main() 


*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

***************************** 


★ 


★  *  / 


scan_t_file( ) 

{ 

num_t_f ields-0 ; 
do 

qet  names(t  fields [num_t_fields] . name, 

St_f ields [num_t_f ields] .blength) ; 

num_t_f ields++ ; 

} 

while ( ch  ! -  EOF ) ; 

]  /******  end  scanf  t  file  *****/ 
s  /  -  C-7 


/a****************************************************************** 

*  DATE:  1  DEC  1987 

*  Version :  1.0 

*  PROCEDURE :  update__blengths  (  ) 

*  DESCRIPTION: 

*  Determines  the  number  of  bits  in  each  field  and 

*  the  total  number  of  bits  in  the  word. 

* 

*  PASSED  VARIABLES: 

*  RETURNS: 

*  GLOBAL  VARIABLES  USED: 

*  num_t_fields 

*  num_wd_fields 

*  t_fields 

*  bits_in_word 

*  GLOBAL  VARIABLES  CHANGED: 

*  t_fields 

*  bits_in_word 

*  FILES  READ: 

*  FILES  WRITTEN: 

*  MODULES  CALLED: 

*  CALLING  MODULES: 

*  main() 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

it******************************************************************, 

update_blengths ( ) 

{ 


bits_in_word  -  0; 

f  or  ( i=0  ,•  i<num_t_f  ields ;  i++ ) 

for  ( j  -0 j  <  num_wd_f  ields ;  j  ++ ) 

if  (strcmp(t_f ields [i] .name, fields[ j] .name)=-0) 

{ 

fields [ j] .blength  -  t_f 


APPENDIX  D 


Code  Created  by  PREG 


tdefine  BitsInWord  60 

struct  SYMBOL_ENTRY  BR_SELtbl [MaxSubFields] ; 

struct  SYMBOL_ENTRY  ALUtbl [MaxSubFields] ; 

struct  SYMBOL_ENTRY  LOAD_FDtbl [MaxSubFields] ; 

struct  SYMBOL_ENTRY  REGtbl [MaxSubFields] ; 

struct  SYMBOL_ENTRY  SPEC_FUNCTtbl [MaxSubFields] ; 

struct  SYMBOL_ENTRY  WFTOPtbl [MaxSubFields] ; 

struct  SYMBOL_ENTRY  PFADONEtbl [MaxSubFields] ; 

struct  SYMBOL_ENTRY  NXT_ADDRtbl [MaxSubFields] ; 

int  num_fieldsBR_SEL; 

int  num_fieldsALU; 

int  num_fieldsLOAD_FD; 

int  num_f ieldsREG; 

int  num_fieldsSPEC_FUNCT; 

int  num_fieldsWFTOP; 

int  num_fieldsPFADONE; 

int  num_f ieldsNXT_ADDR ; 

int  lab_b_length  -  16/ 


read_trans_table ( ) 
t 

char  fill [100], filll [100]/ 

symbolfile  -  fopen("t  file","r"); 

f scant < symbolf ile, "% ["\n] % [\  \n] " , fill , filll); 

readin ( BR_SELtbl , &num_f ieldsBR_SEL ) / 

readin ( ALUtbl,  S.num_f ields ALU)  / 

readin (LOAD_FDtbl , &num_fieldsLOAD_FD) ; 

readin ( REGtbl,  &ntun_f ieldsREG)  ; 

readin ( SPEC_FUNCTtbl , &num_f ieldsSPEC_FUNCT ) ; 

readin ( WFTOPtbl , &num_f ieldsWFTOP ) ; 

readin { PFADONEtbl , &num_fieldsPFADONE) ; 

readin (NXT_ADDRtbl , &num_fieldsNXT_ADDR) ; 

fclose(symbolfile) / 

}/*  end  read_trans_t«d3le  */ 


translate( ) 

stripped  -  fopen(strip_file, "r" ) ; 
transfile  -  f open (trans_f ile, "w" ) ; 
fscanf (stripped, "%s" , input) ; 
while ( strcmp ( END , input ) ! -0 ) 

i  f  ( symtrans  ( BR_SELtbl ,  input ,  num_f ieldsBR_SEL ) —TRUE ) 
fscanf (stripped, "%s" , input) ; 
i  f  ( symtrans  ( ALUtbl ,  input ,  num_f  ieldsALU )  —TRUE ) 
fscanf (stripped, "%s" , input) ; 
if ( symtrans ( LOAD_FDtbl , input , num_f ieldsLOAD_FD ) —TRUE ) 
fscanf (stripped, "%s" , input) ; 
i  f  ( symtrans  ( REGtbl ,  input ,  num_f  ieldsREG )  —TRUE ) 
fscanf  (stripped, ’’%s"  , input)  ; 

i  f  ( symtrans  ( REGtbl ,  input ,  num_f  ieldsREG )  —TRUE ) 
fscanf (stripped, "%s", input) ; 
i  f  ( symtrans  ( REGtbl ,  input ,  num_f  ieldsREG )  —TRUE ) 
fscanf (stripped, "%s" , input) ; 

i  f  ( symtrans  ( SPEC_FUNCTtbl ,  input ,  num_f  ieldsSPEC_FUNCT )  —TRUE ) 
fscanf (stripped, "%s" , input) ; 
i f ( symtrans ( WFTOPtbl , input , num_f ieldsWFTOP ) —TRUE ) 
fscanf (stripped, "%s" , input) ; 
i f ( symtrans ( PFADONEtbl , input , num_f ieldsPFADONE ) —TRUE ) 
fscanf (stripped, "%s", input) ; 
if  (literal (input)— TRUE) 

fscanf (stripped, "%s" , input ) ; 
else 

if  ( symtrans ( Label tbl , input , index__to_labels )  —TRUE ) 
fscanf (stripped, "%s" , input) ; 
else 

if ( symtrans ( NXT_ADDRtbl , input , num_f ieldsNXT_ADDR ) —TRUE ) 

fscanf ( stripped, "%s" , input) ; 

if (strcmp (NOP, input)  —  FALSE)  fscanf(stripped,"%s",input); 

if(input[0]  —  '  +  '  )  fprintf(  transfile,  "  +\n"); 

else 

print f ( "\nERROR:  symbol  >%s<  not  def ined\n" , input) ; 
f  scan  f ( stripped , " %  s " , inpu t ) ; 

} 

f close ( transfile) ; 
f close ( stripped) ; 

}/*  end  translate*/ 

make_ref f ile( ) 

{ 

char  tO [9+1] ,tl[4+l] ,t2[l+l] ,t3 [5+1] ,t4 [5+1] ,t5[5+l] , 

t6 [ 13+1] , t7 [1+1] , t8 [ 1+1] , t9 [ 16+1] ; 

listingfile  -  fopen( l_f ile, "r" ) ; 
transfile  -  fopen(trans_file, "r" ) ; 
reffile  -  f open ( r_f ile, "w" ) ; 
fscanf ( listingfile, "% ["/ ] ;\n" ,line) ; 

D-2 


while (strcmp( END, line) ! —0 ) 
{ 


strcat(line,EOL2) ; 

fprintf (reffile, "%-50s" , line) ; 

fscanf ( transfile, "% [~+]+\n" ,linel) ; 

sscaijf ( linel , "%9s%4s%ls%5s%5s%5s%13s%ls%ls%16s",t0,tl,t2,t3, 

t4,t5,t6,t7,t8,t9)  ; 

fprintf (reffile, "\n%s  %s  %s  %s  %s  %s  %s  %s  %s  %s  \n",t0,tl, 

t2,t3,t4,t5,t6,t7,t8,t9) ; 

fscanf  ( listing  file,  "%[*,-]  ;\n"  ,  line )  ,- 
} 

fprintf  (reffile,  "end,-\n"  ) : 
fclose(  transfile)  ,- 
fclose(reffile) ; 
fclose( listingfile) ; 

}/***  end  make_reffile  ****/ 

reverse_comp ( ) 

{ 


revfile  -  fopen(r_file, "wH ) ; 
transfile  -  fopen(trans_file,"rM); 
f  or  ( i-0  ,•  i  <  line_num-l  ,•  i++ ) 

{ 

fprintf (revfile, "%0.6d  " ,i) ; fscanf (transfile, "%9s" , input) ; 

revtrans ( BR_SELtbl , input, num_f ieldsBR_SEL) ; 

fscanf (transfile, "%4s" , input) ; 

revtrans ( ALUtbl , input , num_f ieldsALU ) ; 

fscanf (transfile, "%ls" , input) / 

revtrans  ( LOAD_FDtbl ,  input ,  num_f  ieldsLOAD_FD )  ,• 

fscanf (transfile, "%5s" , input) ; 

revtrans ( REGtbl , input , num_f ieldsREG ) ; 

fscanf (transfile, "%5s", input) / 

revtrans ( REGtbl , input , num_f ieldsREG ) ; 

fscanf (transfile, "%5s" , input) ; 

revtrans ( REGtbl , input, num_f ieldsREG) ; 

fscanf  (transfile,  "%13s"  , input)  ,- 

revtrans ( SPEC_FUNCTtbl , input , num_f ieldsSPEC_FUNCT ) ; 

fscanf (transfile, "%ls" , input) ; 

revtrans  (WFTOPtbl ,  input ,  nuin_fieldsWFTOP) ; 

fscanf  (transfile,  "%ls"  ,  input)  ,- 

revtrans (PFADONEtbl, input, num_f ieldsPFADONE) ; 

fscanf (transfile, "%16s" , input) ; 

if  ( revtrans (NXT_ADDRtbl, input, num_fieldsNXT_ADDR)— FALSE) ; 

( 

if  (convert (input)  !-  0) 

fprintf ( revfile, "  %d( %s ) " , convert ( input ) , input ) ; 

} 

fscanf (transfile, "%s"  , input) ; 
fprintf (revfile, "\n" ) ; 

} 

fclose( transfile)  ; 
fclose( revfile) ; 

}/*  end  reverse  comp*/ 


APPENDIX  E 


Assembler  Skeleton 


y'****************************************************************** 

★ 

*  DATE:  1  DEC  1987 

*  Version:  1.0 

* 

*  TITLE:  Assembler  Skeleton 

*  FILENAME:  ASSEM.C 

*  COORDINATOR:  Capt  R  W  Linderman 

*  PROJECT:  Generic  Microcode  Assembler  Tool  (GMAT) 

*  OPERATING  SYSTEM:  UNIX  4.3BSD 

*  LANGUAGE:  C 

*  CONTENTS: 

*  readin() 

*  symtrans( ) 

*  literal ( ) 

*  itobs( ) 

*  convert ( ) 

*  revtrans() 

*  vhdl_out ( ) 

*  strip() 

*  make_xromaddrs ( ) 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 


#mclude  <stdio.h> 
linclude  <math.h> 

#def ine  EOL  "+" 

Idefine  E0L2 

Idefine  MaxFieldName  10C 

Idefine  MaxFileName  10C 

#define  MaxValueLength  10C 

Idefine  MaxSubFields  10C 

Idefine  MaxSubFieldLength  10C 

Idefine  MaxLineLength  10C 

Idefine  MaxLabelLength  10C 

Idefine  MaxLabels  10C 

Idefine  TRUE  1 

Idefine  FALSE  0 

struct  SYMBOL_ENTRY  { 

char  SubField [MaxSubFieldLength] 
char  Value [MaxValueLength] ; 


\' 


FILE  *revfile,  *symbolfile,  *stripped,  *transfile; 

FILE  *reffile,  *xromaddrs,  *infile,  *listingfile; 

char  strip_f ile [MaxFileName] , i_file [MaxFileName] ; 
char  1  file [MaxFileName] , r_f ile [MaxFileName] ; 

char  trans_file [MaxFileName] , a_file [MaxFileName] ,v_f ile [MaxFileName] 
char  line [MaxLineLength] , line_no_lab [MaxLineLength] ; 
char  linel [MaxLineLength] ; 

char  temp [MaxLineLength] , label [MaxLabelLength] ; 

char  LABELarray [MaxLabels] [MaxLabelLength] ; 

char  input [MaxSubFieldLength] ; 

char  line_numl [MaxValueLength] ; 

char  slice [MaxValueLength] ; 

char  tchar; 

char  remove [ 50 ]  ; 

char  END[] -"end" ; 
char  END2 [ ] - " end ; " ; 
char  LST[]-" .1st" ; 
char  MC[]-".mc"; 
char  REF[]*="  .ref"  ; 
char  STRIP []-" .strip" ; 
char  TRANS [ ] - " . trans " ; 
char  VHDL[ ] . vhd" ; 
char  ADDR [ ] * " . addr " ; 
char  NOP[]="nop" ; 
char  COL []-':'; 

int  LABELaddr [MaxLabels] ; 
int  i , j , k , index_to_labels ; 
int  line_num, label_num; 
int  xrom_length ; 
int  num,base, indexa; 
int  ch; 
int  b_length; 

int  generate_xromaddrs , reverse_compile , generate_vhdl ; 
int  clean; 

struct  SYMBOL_ENTRY  Labeltbl [MaxSubFields] ; 

# include  "assem.h" 

main(argc,argv)  int  argc; 
char  **argv; 

[ 

line_num  -0; 
label_num  -  0; 
index_to_labels  -  1; 
generate_xromaddrs  -  FALSE; 
reverse_compile  -  FALSE; 
generate_vhdl  -  FALSE; 
clean  -  TRUE; 


>  ** 


( (argc  <  2) | | (argc  >  3) ) 


printf  (  "\n\nUsage:  assem  file_name  [xvrd]  \n\n"  ) ; 
exit ( 1 ) ; 

:  “  3)  /*  then  options  */ 


(argc 


( index ( argv [ 2 ] , ' x ' ) 
(index (argv [2] , '  v' ) 
( index ( argv [  2  ]  ,  '  r ' ) 
(index(argv[2] , *  d ' ) 


-  NULL)  generate_xromaddrs  -  TRUI 

-  NULL)  generate_vhdl=  TRUE; 

-  NULL)  reverse_compile  -  TRUE; 

-  NULL)  clean  =  FALSE; 


strcpy(i_file 
strcpy(l_file 
strcat(l_file 
strcpy(r_file 
strcat(r_file 
strcpy(strip_ 
strcat(strip_ 
strcpy(trans_ 
strcat( trans_ 


,  argv [ 1 ] ) ; 
,argv[l] ) ; 

,  LST)  ; 

, argv  [ 1]  )  ; 

,  REF )  ; 

file, argv [1] ) ; 
file, STRIP) ; 
file,argv[l] ) ; 
file, TRANS ) ; 


strcpy(v_file,argv[l)); 
strcat ( v_f ile , VHDL ) ; 
strcpy ( a_f ile , argv [ 1 ] ) ; 
strcat(a_file, ADDR) ; 

strip( ) ; 

read_trans_table( ) ; 
translate ( ) ; 
make_reffile( ) ; 
if  (generate_xromaddrs) 

make_xromaddrs( ) ; 
if  (reverse_compile) 

reverse_comp  ( )  ,• 
if  (generate_vhdl) 
vhdl_out ( ) ; 

if  (clean) 

{ 

strcpy  ( remove ,  "  rm  "),- 
strcat ( remove, trans_f ile) ; 
strcat  ( remove ,  "  "  )  ,■ 
strcat (remove, strip_file) ; 
system( remove ) ; 


}/*  end  main  progreun  */ 
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/****************************************************************** 

*  DATE:  1  DEC  1987 

*  Version:  1.0 

*  PROCEDURE:  readin{ ) 

*  DESCRIPTION: 

*  This  procedure  reads  in  the  translation  file  one  char 

*  at  a  time  to  get  the  field  name  and  the  value. 

* 

*  PASSED  VARIABLES:  none 

*  RETURNS : 

*  t_field  :  pointer  to  structure  for  field 

*  num_sub_fields  :  number  of  fields  found 

*  GLOBAL  VARIBLES  USED: 

*  symbolfile 

*  GLOBAL  VARIBLES  CHANGED: 

*  FILES  READ: 

*  symbolfile  :  program  name  for  the  translation  file 

*  FILES  WRITTEN: 

*  MODULES  CALLED: 

*  CALLING  MODULES : 

*  read_trans_table 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

readin ( t_f ield , num_sub_f ields ) 

struct  SYMBOL_ENTRY  t_f ield [ ] ; 
int  * num_sub_f ields ; 

t 

i  -  0; 

for ( ;  ; ) 

[ 

k  -  j  -  0; 


ch  =  fgetc ( symbolfile) ; 

while  (ch  ~  '  ' )  ch  =  fgetc( symbolfile) ;  /*  ignore  leading  blanks  */ 
if  (ch  —  ' \n ' | |  ch  —  EOF) 

t 

*num_sub_f ields  -  i; 

break;  /*  if  at  end  of  blank  line  or  file  quit*/ 

} 

while  (ch  !-  '\n'  &&  ch  !«  '  ' )/*  until  EOL  or  blank  */ 

{ 

t_field[i] .SubField[k++]  -  ch; 
ch  -  fgetc ( symbol fi le ) ; 

} 

t_field[i] .SubField[k]  -  '\0'; 

while  (ch  —  '  ' )  ch  -  fgetc ( symbolfile) ;  /*  skip  blanks  */ 
while  (ch  !-  '\n'  &&  ch  !»  '  ') 

{ 

t_field (i] .Value [ j++]  -  ch; 
ch  -  fgetc (symbol file ) ; 

} 

t_field[i++] .Value[ j]  -  '\0'; 

while  (ch  '  ' )  ch  -  fgetc ( symbolfile) ;  /*  ignore  trailing  blanks  */ 
}/*  end  readin  */ 


V  ■  • 

V» 


/A***************************************************************** 

*  DATE:  1  DEC  1987 

*  Version:  1.0 

*  PROCEDURE :  symtrans ( ) 

*  DESCRIPTION: 

*  This  procedure  searches  a  table  for  a  svsbol .  If  the  symbol 

*  is  found  then  the  translation  is  written  out,  if  not  the 

*  default  is  written  out. 

* 

*  PASSED  VARIABLES: 

*  tablename  :  pointer  to  table 

*  symbol  :  symbol  to  look  for 

*  lentbl  :  length  of  the  table 

*  RETURNS : 

*  GLOBAL  VARIBLES  USED: 

*  transfile 

*  GLOBAL  VARIBLES  CHANGED: 

*  FILES  READ: 

*  FILES  WRITTEN: 

*  transfile 

*  MODULES  CALLED : 

*  CALLING  MODULES: 

*  translate( ) 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 


int  symtrans (tablename, symbol, lentbl)  struct  SYMBOL_ENTRY  tablename [] 
char  symbol  [  ]  ; 
int  lentbl; 

( 

int  s index , found ; 
found  -  FALSE; 

for(sindex“0;  (sindex<=lentbl)&&(  found=**FALSE) ; ) 

( 

if  (strcmp( tablename [sindex++] .SubField, symbol )==0) 
found  -  TRUE; 


i f  ( f ound--FALSE ) 

fprintf (transfile, "%s" ,tablename[0] .Value) ; 
return ( FALSE ) ; 

} 

else 

fprintf (transfile, "%s", tablename [sindex-1] .Value); 
return ( TRUE ) ; 

} 

}/*  end  symtrans  */ 


/***★*******★★★★★★★★★***★ 

*  DATE:  1  DEC  1987 

*  Version:  1.0 

*  PROCEDURE:  literal () 

*  DESCRIPTION: 

*  This  procedure  determines  if  the  symbol  is  a  literal.  If  so 

*  it  prints  out  the  value.  If  not  it  return  FALSE. 

* 

*  PASSED  VARIABLES:  symbol 

*  RETURNS : 

*  TRUE  :  if  symbol  was  a  literal 

*  FALSE  :  if  symbol  was  not  a  literal 

*  GLOBAL  VARIBLES  USED: 

*  GLOBAL  VARIBLES  CHANGED: 

*  FILES  READ: 

*  FILES  WRITTEN: 

*  MODULES  CALLED: 

*  CALLING  MODULES : 

*  translate( ) 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

A******************************************************************/ 

int  literal (symbol)  char  symbol [3; 

{ 

int  lindex; 
if  (symbol[Q]  —  '#') 

[ 

for (lindex-l;lindex< (strlen( symbol) ) ;lindex++) 
fprintf ( transfile, "%c" , symbol [lindex] ) ; 
return ( TRUE ) ; 

} 

else 

return ( FALSE ) ; 

}/*  end  literal  */ 


/  *  *********************** 

*  DATE:  1  DEC  1987 

*  Version :  1.0 

*  PROCEDURE:  itobs( ) 

*  DESCRIPTION: 

*  This  procedure  converts  an  integer  to  a  binary  string 

* 

*  PASSED  VARIABLES: 

*  number  :  integer  value 

*  lab_b  length  :  label  field  length 
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*  RETURNS: 

*  b_string  :  binary  string 

*  GLOBAL  VARIBLES  USED: 

*  GLOBAL  VARIBLES  CHANGED: 

*  FILES  READ: 

*  FILES  WRITTEN: 

*  MODULES  CALLED: 

*  CALLING  MODULES: 

*  strip { ) 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

★★★★★★★★★★★★★★★★**************** 
itobs ( number , b_string , lab_b_length)  int  number; 
char  b_string[];  int  lab_b_length; 

{ 

int  index 8; 
int  index9; 

b_string [ lab_b_length] - ' \0 ' ; 

for ( index9-0 , index8-lab  b_length-l ; index8>-0 ; index8 — , index9++ ) 

{ 

if  ( (int) (number/pow( (double ) 2, (double )index8) )  >=1) 
b_string [ index9 ] - ' 1 ' ; 

^  number  -  number  -  (int )pow( (double ) 2, (double )index8) ; 

else 

{ 

b_string [ index9 ] - ' 0 ' ; 


}/*  end  itobs  */ 

/♦A**************************************************************** 

*  DATE:  1  DEC  1987 

*  Version :  1.0 

*  PROCEDURE :  convert ( ) 

*  DESCRIPTION: 

*  This  procedure  return  the  integer  value  of  the  string  input 

* 

*  PASSED  VARIABLES: 

*  slice  :  binary  string  of  1/0 

*  RETURNS : 

*  num:  the  integer  value 

*  GLOBAL  VARIBLES  USED: 

*  GLOBAL  VARIBLES  CHANGED: 

*  FILES  READ: 

*  FILES  WRITTEN: 

*  MODULES  CALLED : 

*  CALLING  MODULES: 

*  revtrans() 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

******★★★★*★***★★★★*****★★*★***★ 
long  convert (slice) char  slice[]; 

( 

long  num; 
int  i , j ; 


num  -  0 ; 

f or (i-strlen{ slice ) -1, j-0;i>-0;i — , j++) 
if  (slice[i]--'l' ) 

num  -  num  +  ( int )pow( (double) 2, (double) j ) ; 

return (num) ; 

}/*  end  convert  */ 

/****************************************************************** 

*  DATE:  1  DEC  1987 

*  Version :  1.0 

*  PROCEDURE:  revtrans() 

*  DESCRIPTION: 

*  This  t-  ocedure  takes  a  value  and  prints  the  field  name 

* 

*  PASSED  Vi  7IABLES : 

*  tablename  :  pointer  to  a  table 

*  symbol  :  value  to  translate 

*  lentbl  :  table  length 

*  RETURNS : 

*  GLOBAL  VARIBLES  USED: 

*  revfile 

*  GLOBAL  VARIBLES  CHANGED: 

*  FILES  READ: 

*  revfile 

*  FILES  WRITTEN: 

*  MODULES  CALLED: 

*  CALLING  MODULES : 

*  reverse_comp ( ) 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

**************************************************x****************y 

int  revtrans( tablename, symbol, lentbl)  struct  SYMBOL_ENTRY  tablename [] ; 
char  symbol [ ] ; 
int  lentbl; 

[ 

int  rindex, found; 
found  «  FALSE; 

for(rindex-0; ( rindex <= lentbl ) && ( f ound--FALSE ) ; ) 

{ 

if  (strcmp( tablename [rindex++] .Value, symbol )==0 ) 
found  -  TRUE; 

} 

if  (rindex— 1) 

return ( FALSE ) ; 

else 

{ 

fprintf (revfile, "%s  ", tablename [rindex- 1] .SubField); 
return ( TRUE ) ; 

} 

}/*end  revtrans  */ 

/****************************************************************** 

*  DATE:  1  DEC  1987 

*  Version :  1.0 

*  PROCEDURE:  vhdl_OUt( ) 

*  DESCRIPTION: 

*  This  procedure  reads  the  translated  file  and  output  a  VHDL 

*  description  of  the  ROM. 
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*  PASSED  VARIABLES:  none 

*  RETURNS : 

*  GLOBAL  VARIBLES  USED: 

*  Bits InWord 

*  v_file 

*  GLOBAL  VARIBLES  CHANGED: 

*  FILES  READ: 

*  trans_file  :  file  of  translated  mcode 

*  FILES  WRITTEN: 

*  __  vhdl_file  :  VHDL  description 

*  MODULES  CALLED: 

*  CALLING  MODULES : 

*  main() 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

a******************************************************************/ 

vhdl_out( ) 

{ 

FILE  *data, *vhdl_file; 
int  word_num; 

char  xrom_word [BitsInWord+1] ; 
char  eol [ 2 ] ; 

data  -  fopen( trans_file, "r" ) ; 
vhdl_file  -  fopen( v_file, "w" ) ; 


fprintf (vhdl_file, "package  AN_XROM  is\n"); 

fprintf (vhdl_file, "  type  WORD_%d  is  array  (%d  downto  0) 

of  bit\;\n" ,BitsInWord,BitsInWord-l) ; 
fprintf (vhdl_file, "  type  R0M_ARRAY  is  array  (0  to  %d) 

of  W0RD_%d\;\n" , line_num-l , Bits InWord) ; 
fprintf (vhdl_file,"  function  GETWORD  (WORD_NUMBER  :  integer)\n" )  ,■ 
fprintf (vhdl_file, "  return  WORD_%d  is\n",BitsInWord); 

fprintf (vhdl_file, "  variable  XROM  :  ROM_ARRAY  \,\n"); 

fprintf (vhdl_file, "  variable  RETURN_WORD  :  WORD_%d  \;\n", 

BitsInWord) ; 


fprintf (vhdl_file, "  begin\n"); 


word_num  -  0; 

whi le ( f scan  f ( da ta , " %  s  %  s " , xrom_word , eol ) 

{ 

fprintf (vhdl_file, "  XR0M(%d) 

} 


!-  EOF) 

=  B\"*s\" ;\n" , 

word_num++ , xrom_word ) ; 


fprintf (vhdl_file, "  RETURN_WORD  :-  XRC»l(WORD_NUMBER) ;\n" ) ; 

fprintf (vhdl_file, "  return  ( RETURN_WORD ) ; \n " ) ; 

fprintf (vhdl_file, "  end  GETW0RD,\n"); 

fprintf (vhdl_file, "  end  AN_XR0M\;\n" ) ; 


}  /*end  vhdl_out*/ 


/**********************************************★*★*★*★*★★*★***★*★****■*★* 

*  DATE:  1  DEC  1987 

*  Version:  1.0 

*  PROCEDURE:  strip() 

*  DESCRIPTION: 

*  This  procedure  reads  the  microcode  and  strips  off  the 
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*  delimiters  and  converts  the  labels  into  integer  valued 

*  binary  strings.  It  also  makes  a  listing  file  and  check 

*  for  the  exsistence  of  the  input  file. 

* 

*  PASSED  VARIABLES:  none 

*  RETURNS : 

*  GLOBAL  VARIBLES  USED: 

*  strip_file 

*  l_file 

*  _  i_file 

*  ~  line 

*  line_num 

*  line_no_lab 

*  line_numl 

*  END2 

*  EOL 

*  Labeltbl 

*  GLOBAL  VARIBLES  CHANGED: 

*  line 

*  line_num 

*  line_no_lab 

*  line_numl 

*  Labeltbl 

*  FILES  READ: 

*  infile 

*  FILES  WRITTEN: 

*  listingfile 

*  stripped 

*  MODULES  CALLED: 

*  CALLING  MODULES : 

*  main() 

* 

*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 

strip( ) 

{ 

stripped  -  fopen(strip_file, "w" ) ; 
if  ((infile  -  fopen( i_file, "r" ) )  --  NULL) 

{ 

printf ( "\nFile  %s  could  not  be  found. \n\n" , i_file) ,- 
exit(l) ; 

} 

listingfile  «  fopen(l_file,  "w" )  ,• 

fscanf (infile, "  %  [  ~ , line) ; 

streat ( line , E0L2 ) ; 

while (strcmp(END2, line) !-0) { 

if  (index(line, ' : ' )  !-  NULL) 

{ 

sscanf ( line, "% [  :]:  %[  ;]",label,line_no_lab); 

strepy ( Labeltbl [index_to_labels] . SubField, label ) ; 

itobs ( line_num, line_numl , lab_b_length) ; 

strepy ( Labeltbl [index_to_labels++] .Value, line_numl) ; 

LABELaddr  [label_num]  -  line_num,- 

strepy (LABELarray [label_num++] , label) ; 

} 

if  (index(line, ' : ' )  !-  NULL) 

t 

8trcat( label, COL) ; 
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fprintf ( listingfile, "%0 . 6d  %-15s%s ;\n" , line_num++ , 

label , line_no_lab ) ; 

fprintf (stripped, "%s  %s\n" ,line_no_lab,EOL) ; 

} 

else 

{ 

fprintf ( listing file, "%0 . 6d 

%s\n" , line_num++ , line) ,• 
sscanf (line, "  %  [  ~ " ,line) ; 
fprintf ( stripped, "%s  %s\n" , line, EOL) ; 

^  } 

fscanf ( infile, "% [^\n]\n" ,line) / 
fscanf (infile, "% [  ; ]\n" ,line) ; 
s treat ( line , EOL2 ) ; 

} 

fprintf (stripped, "%s\n" ,END) ; 
fprintf (listingfile, "%s\n" ,END2) ; 

fclose( listingfile) ; 
f close ( stripped) ; 

}/*end  strip  */ 


DATE:  1  DEC  1987 

*  Version:  1.0 

*  PROCEDURE :  makexromaddrs ( ) 

*  DESCRIPTION: 

*  This  procedure  reads  the  translated  file  and  build  the 

*  XROM  compiler  input  file. 

* 

*  PASSED  VARIABLES:  none 

*  RETURNS : 

*  GLOBAL  VARIBLES  USED: 

*  trans_file 

*  Bits InWord 

*  GLOBAL  VARIBLES  CHANGED: 

*  FILES  READ: 

*  trans_file 

*  FILES  WRITTEN: 

*  xromaddrs 

*  MODULES  CALLED: 

*  convert ( ) 

*  CALLING  MODULES : 

*  main() 


*  AUTHOR:  Robert  S.  Hauser 

*  HISTORY: 


make_xromaddrs ( ) 
{ 


transfile  -  fopen( trans_file, "r" ) ; 
xromaddrs  -  fopen(a_file, "w" ) ; 


xrom_length  -(int) (  BitsInWord/4 ) ; 

while( fscanf (transfile, "% P+]+  \n",line)  !-  EOF) 

for(base-0;base<  (4*xrom_length)  ,-base  -  base+xrom_length) 

{ 
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for  ( indexa-0 ;  indexa<xrom_length;  indexa++ ) 

slice [indexa] -line [base+indexa] ; 

slice [ indexa+base] - ' \0 ' ; 
num  -  convert ( slice ) ; 
fprintf (xromaddrs, "%d\n" ,num) ; 


f close ( xromaddrs ) ; 
fclose( transfile) ; 
}/*  end  make_xromaddrs  */ 


♦include  "assem. tailored 
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APPENDIX  F 


Microcode  Translation  File 


BR_SEL  ALU  LOAD_FD  REG  REG  REG  SPEC_FUNCT  WFTOP  PFADONE  NXT_ADDR  # 


BR_SEL 

RET 

CALL 

JMP 

CALLCR 

CALLCE 

CALLnZ 

JGE 

JnZ 

JZ 

JnOP 

Jn4DN 

JnWD 

JnECol 

JnECo2 

JnECo3 

Jnll 

Jnl2 

Jnl3 

Jn21 

Jd22 

Jn23 

Jn31 

Jn32 

Jn33 

JnEU 

JnEl'l 

JnEU2 

JnEU3 

JnEC 

JnECl 

JnEC2 

JnEC3 

JnPE 

JnPEl 

JnPE2 

JnPE3 


000000000 

000000001 

000000010 

000000011 

000010110 

000100110 

001001110 

000111111 

001001111 

001000111 

001011111 

001101111 

001111111 

010001111 

010011111 

010101111 

010111111 

011001111 

011011111 

011101111 

011111111 

100001111 

100011111 

100101111 

100111111 

101001111 

101011111 

101101111 

101111111 

110001111 

110011111 

110101111 

110111111 

111001111 

111011111 

111101111 

111111111 
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SPEC_FUNCT 

Flip 

Loadlmt 

FlipLoadI  nit 

LoadScale 

LoadPSi 

ShiftTSR 

LPSiSTSR 

ShiftPSR 

LoadELR 

HostCntl 

LDStateR  1 

LDStateRi 

LDStateR3 

LDState 

DriveScale 

WTTOP 

WFTop 

PFADONE 

PFAdone 

NXT_ADDR 


0000000000000 

1000000000000 

0100000000000 

1100000000000 

0010000000000 

0001000000000 

0000100000000 

0001100000000 

0000010000000 

0000001000000 

0000000100000 

0000000010010 

0000000001010 

0000000000110 

0000000000010 

0000000000001 

0 

1 

0 

1 

0000000000000000 
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APPENDIX  G 


Microcode  Word  Format 


BITS 

0-8  BR_SEL(9)  -  Branch  Select  Field 


0-4  CMS(5)  -  Conditional  Mux  Select 
0  None 

1  Reconfigure 

2  Error 

3:  Negative 

4  Zero 

5  PFAoperate 

6  4DONE 

7  Watch  Dog  Error  (WD_ERR) 

8  Error  in  Col  1  (ErrColl) 

9  Error  in  Col  2  (ErrCol2) 

10  Error  in  Col  2  (ErrCoi2) 

11:  Watch  Dog  Error  1 1  (WDl  1) 

12  Watch  Dog  Error  12  (WD12) 

13  Watch  Dog  Error  13  (WD13) 

14  Watch  Dog  Error  21  (WD21) 

15  Watch  Dog  Error  22  (WD22) 

16  Watch  Dog  Error  23  (WD23) 

17  Watch  Dog  Error  31  (WD31) 

18  Watch  Dog  Error  32  (WD32) 

19  Watch  Dog  Error  33  (WD33) 

20  EU 

21  ECCU1 

22  ECCU2 

23  ECCU3 

24  EC 

25  ECCC1 

26  ECCC2 

27  ECCC3 

28  PE 

29  PEI 

30  PE2 

31  PE3 

5  BR_ON(l)  -  Branch  On 
0  Positive  Logic 
1  Negative  Logic 

6-8  NAF(3)  -  Next  Address  Field 
0  Continue 

1  Return 

2  Call 

3  Unconditional  Branch 

4  Conditional  Datapath  Load 


■d 


5  Conditional  Return 

6  Conditional  Call 

7  Conditional  Branch 


9-12  ALU(4)  -  ALU  Function  Select 
0  nop 

1  C  =  A' 

2  C  «—  A  and  B 

3  C  «—  A  xor  B 

4  C  —  A  or  B 

5  C  •*—  A  (raov) 

6  Set  Carry  Flag 

7  Reset  Carry  Flag 

8  C  *—  A  +  1 

9  C  «—  A  -  1 

10  C  —  A  +  B  +  cy 

11  C  ♦—  A  +  B 

12  not  defined 

13  C  «—  A-  B 

14  C*— A  —  B—  br 

15  A  -  B 


13  LOADJFIELD(l)  -  Load  Next  Address  Field  to  C  Bus 
0  No  Load 
1  Load 

14-18  ABUS_SEL(5)  -  A  Bus  Select 
0  none 

1-29  Registers  1-19 
30-31  not  defined 

19-23  BBUS_SEL(5)  -  B  Bus  Select 
0  none 

1-29  Registers  1-19 
30-31  not  defined 

24-28  CBUS_SEL(5)  -  C  Bus  Select 
0  none 

1-29  Registers  1-19 
30-31  not  defined 

29-41  SPEC_FUNCT(  13)  -  Special  Functions 
bit 

0(MSB)  Flip  Memories 

1  Load  Initial  Scale  Factors 

2  Load  Scale  Factors  from  WFTs 

3  Load  Problem  Status  Registers 

4  Shift  Temp  Scale  Register 

5  Shift  Permanent  Scale  Register 

6  Load  Error  Location  Register 

7  Host  Control 

8  Load  State  into  Row  1 

9  Load  State  into  Row  2 

10  Load  State  into  Row  3 

1 1  Load  State 

12  Drive  Scale  Factors 


42  WTTOP(l)-  WFT  Operate 
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Abstract 


High-speed  digital  signal  processing  has  a  wide  range  of  applications  including, 
radar,  sonar,  image  processing,  and  target  acquisition.  The  calculation  of  the  Discrete 
Fourier  Transform  (DFT)  used  in  these  applications  has  long  been  a  significant 
bottleneck  for  high-speed  processing  Previous  AFIT  students  have  adopted  a  Prime  Fac¬ 
tor  Algorithm  (PFA)  method  using  Winograd  Fourier  Transform  (WFT)  processors. 
Three  WFT  processors  are  pipelined  into  a  system  capable  of  computing  a  4080-point 
DFT  on  complex  data  approximately  every  120  microseconds  when  operating  with  a  70 
MHz  clock  * 

This  thesis  effort  addressed  '.he  design  and  implementation  of  PFA  controller  chip 
and  interconnecting  memory  modules  between  the  WFT  processors.  The  PFA  controller 
is  an  application  specific  processor  to  control  the  flow  of  information  in  the  pipeline, 
interface  to  the  WFT  processors,  monitor  pipeline  status,  and  take  corrective  action  in 
the  presence  of  faults.  The  interconnecting  memory  modules  buffer  the  data  coming  out 
of  a  WFT  processor  and  going  into  another  allowing  concurrent  reading  and  writing. 

The  PFA  controller  chip  was  designed,  simulated,  and  submitted  for  fabrication 
through  MOSIS.  Twenty-eight  16-bit  registers  store  the  pipeline  information.  An 
arithmetic/logic  unit  (ALU)  computes  data  transformations.  A  read  only  memory  stores 
the  microcode.  A  control  sequencer  sequences  through  the  proper  code  segments.  Finally, 
special  circuitry  interprets  the  fault  information  and  reconfigures  the  pipeline. 

This  thesis  effort  included  writing  a  microcode  assembler  to  to  raise  the  user  inter¬ 
face  to  the  AFIT-XROM  silicon  compiler.  Raising  the  user’s  level  of  abstraction  to 
mnemonic  microcode,  while  still  providing  an  error  free  path  to  silicon  layout,  reduces 
chances  for  error  in  the  microcode  specification.  A  generic  microcode  assembler  tool  was 
created  as  an  extension  for  use  with  other  application  specific  processors.  This  tools  gen¬ 
erates  a  microcode  assembler  from  a  word  format  and  a  translation  file.  The  assembler 
will  output  a  file  compatible  with  the  XROM  compiler,  a  VHDL  description  of  the 
XROM,  a  listing  file,  a  reference  file,  and  a  reverse  assembly  This  tool  was  tested  on  two 
other  AFIT  theses  and  a  computer  architecture  class. 

A  prototype  memory  chip  was  designed  and  fabricated  in  3  micron  CMOS  through 
MOSIS  to  test  the  1-transistor  memory  cell,  the  wordline  selectors,  and  the  sense 
amplifiers.  Simulations  predict  an  access  time  of  10ns.  A  larger  memory  was  designed, 
simulated,  and  submitted  for  fabrication  through  MOSIS.  It  contains  storage  for  272 
words  of  32  bits  each.  It  is  dual  ported  and  permits  concurrent  reading  and  writing  of  24 
bit  data  The  memory  also  includes  error  control  circuitry  for  single  error  correction  and 
double  error  detection 


